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In it. Opt. 

Length Score Score Sig. Frame 


1 leukaemia type II 9748 1872 2186 39. 22 

1 1ymphot ropic viru 9749 1872 2186 39. 22 

tandsrd deviations above mean **** 
adenopathy virus (E 9176 1246 1904 25.82 

tendsrd deviations above mean *### 
adenopathy virus (M 9229 916 2060 18.76 

tardard deviations above mean **** 
odet icioricy virus t 9671 306 1194 5.70 

V-4) partial provir 5391 299 1257 5.55 

node-? icisncy virus 9264 294 1250 5.44 

node-? iciency virus 9646 290 1247 5.35 

tandard dev/i at ions above mean ***# 
inode-?iciency virus 1142 259 565 4.69 

.tendard deviations above mean **** 
structure o-? the art 306 184 298 3. 08 


tendard deviations t 
tructure o-? the art 


a.' ere- sorted by optimized score. 
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1. KUMZ-158“l:'_ 3P. s-:c 

REK7LV3 i in man T—cel. 1 leukaemia type III (HTLV— III) provira 

ID PL HTLV3 ot.-.hdard. 3NA? 9748 BP. 

XX 

AC XO1762; 

XX 

DT 03—SEP—left? (an correction) 

DT 01 -SEP-i'-’H7 < an correction) 

DT 03-AUGH-1 uHV ( art corr ect ion) 

DT 29-GCT CrYO (minor modi-F i cat ion) 

DT CS-MQV- 'l'Pr < KW added) 

DT 26-MAR-1235 (first entry) 

XX 

DE Hu mar. T--cc.) i leukacviu a type IIX (HTLV-III) proviral genome 

DE (AIDS virus for acquired immune deficiency syndrome) 

XX 

KW aceguired inviiiu.iie deficiency syndrome» direct repeat? endonuclease? 

KW glycoprotein-!, invei.ted repeat5 protease; provirus; 

KW reverse treuseriptase5 terminal repeat. 

XX 

OS Human T-co? 1 leukemia virus type III. 

OC vxridae5 ~.s-PMA enve1oped viruses? Retroviridae. 

XX 

RN C11 < bases 1-3748) 

RA Re.trier L„ . He.se 1 tine W„ » Patarca R. ? Livak K. J. » Stareich B. R. . 

RA Josephs s. F. Doran E R, » Rafalski 3, A. . Whitehorn E. A. . 

RA Bourne is tor h - Ivnnoff U. j Pet.tsway S, R. Jr. . Pearson M. L. . 

RA Lcutenhev c;nr J, A, » Papas T. S. * Ghrayeb J. * Chang N. T. » Gallo R.C. . 

RA Wen ;g- 3 'tae 1 - . . 

RT "Completes nucleotide sequence of the AIDS virus. HTLV-1II"; 

RL Nature 33.3* 277-234< 1S85K 
XX 

RN [£] 

RA MuBSing M- A. » Smith PH. , Cab rad i 1 la C. D. JR. . Benton C. V. , 

RA Kasky I- A, , Capon D„ J. > 

RT "Nucleic 'riel structure and expression of the human AIDS/ 

RT 1 ympheder H y.othy ret rov i ru3"! 

RL Nature 313-450—4SS<1985). 

XX 
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FH 

FT IiMVREF i 2 in vet'ted repeat 

FT SITE ■ 334 long terminal repeat 

FT Pk J M 4,-7 .430 _ TYVE£b-biI£— ———— 
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U3 region 


FT 

CAP 

454 

454 
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SITE 

552 

S34 

U5 region 
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653 

tRNA binding site < tRNA—Lys> 
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put- retroviral nucleic acid 


FT 




binding protein (NBP>(ret-2> 
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pol precursor polypeptides 
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2176 
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SOR short open reading frame 
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env-lor precursor polypeptide 
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8821 

enve l ope g1ycoprotein 
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7787 

put. peptide cleavage site 
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S82I 

put., lor transmembrane 
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poly purine stretch 
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9646 

polyadenylation signal 
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r iff 
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0V‘ 'O 
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5T ATT A~^G:GGQT ACCTGTGTGGAAGGAAGCAACCACCACTCT ATTTTG 


8350 

43f 

60 

5370 6580 6390 6400 
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1. so 

• 
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240 


CATC AG AT! ifT; 

AA40U' 

Tf AT RATA 

- CAGAGS-TACATA - AT — GTTTGGGCCACACATGCCTG — T 

1 1 1 1 1 1 1 I 1 » 1 III 1 
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1 1 1 t 1 . t 1 < < . 1 1 . 1 1 < • 1 

AAuCATATRATACAGAG6TACATAATGTTTGGGC-CACA- 

—CATGCCT 



■■ 

o ; ; 

;60 5440 0450 6460 

6470 




:>5G 

GTAi5 


■ - v) 270 280 230 300 310 

•ViCCGAGAAGAAiTTAGT ATTGGT AAATGTGACAGAAAATTTT AACATQTGGAAAA- 

... , . - i i i i i iii i i i i i 


GTJTI f ; iCICCi''tCi-iC. 'ACi ■* "‘GAiAGO— C-Pr -CAA.GAA.G rA-GTA—TTG 


.■•'•*.30 

v ; j . 

-ATiiiACAT': 


6510 


-GTAAAT-GTGACA-GAAAAT 

6520 


rAP.uA' 


6530 

380 


'AA'i':A"A 
r.-;f50 


540 350 360 370 

.-ccvv.N-rp, i ftOTCAG-TTTATG-GGATCAAAGCCTAAAGCCATGTG-T 
‘: : : : : : : : : 

—'■n !T: TAl-'AA-!":AGATGCATGAGGATATAA-TCAGTTTATGGGAT 

6560 6570 6580 6590 


, 0 - r y-.O 45 0 420 430 440 

_*pGCA' . rr.T:-;Vi-'i"i'A6T"!'1 Aft~0G'l GCACTbiATTTGG-GGAATGCTACTAAT ACCAA 

.7 ;iT, , ” .... : ;:: :: : : :::::::: : :: 

r V7 v'-ya-- 6GP -7i■„ HVf 40 A;'-Yi TAACCCCACTCTGTGTTAGTTTAAAGTGC—ACTGATTTGAAGAA 

Vo.: 6610 6620 6630 6640 6650 6660 

J.jTiO j 470 480 430 500 510 

T/N ~-, <k -'r-i ,v' O-x,..:-:• - -.■■<■£> *•'•;—f'n kAA.AaGATGATGGAuAAAGGAGAGATAAAAAACTGCTCTTTCAA 

1 *'•' ■ ‘‘. |f ; ;7 i ; ; ; ; ; ; ; ; ■ ; ; ; ; ; ; ; ; : 1 1 : i : I ! 1 ! \ \ : ! ! ! i 

>' A k'a rAb:‘i'A4L'c;»GGAGAATGATAATGGAGAAAGGAGAGiAT AAAAAACTGCTCTTTCAA 
0380 SS90 6700 6710 6720 6730 


TGATACTAATA.'P 

oc..70 


rATrAGGAGOA^HMi AAGAC iGTA,-' 


550 560 570 580 

■TC:C. AG ft.AAG A AT A'l GCATTTTTTT AT A A ACTTG AT AT AAT ACCAAT 

......... 4 I 1 > I I ■ I I 1 I I I t 1 I 


| | | I I I I I I I I I * • 


TA"!'' Jl -:SCi-‘.C; V-6-1 ..'A !- 
6740 6*7 


^;i,M=T -V: /rGrri'C ARft A AG A AT ATSC ATTTTTTT AT A A ACTTG AT AT AAT ACCAAT 
:,0 6,760 6770 6780 6730 6800 


c.Qf i 500 510 620 630 640 650 

A^vr 'ViTr"'-I ;• -oil, u v tTl A f AGa fl GiACAAG fTGYf AACACCTCAGTCATTACACAGGCCTGTCCAAAGGT 

, . » . * i . i : ; : ‘ ; ; : ! I : ; ! ; ! ! I ! ! ! I I I 1 .. ((l , i ......... . 

/V'.'vr^-vTi : - T'-' 1 ' I f- r- O -i fTGACAARTTGTAACACCTCAGTCATTACACAaSCCTGTCCAAAGGT 

‘ st’lO " " GS2G 3850 8840 6850 6860 6870 

660 6-0 6.6:0 630 700 710 720 730 

A'l r ( • rr-f-r'M: 'y •AAT" 5GGATACAT~ATTGT GCCCCGGCTGGTTTTGCGATTCTAAAATGTAATAATAAGAC 

, V~ , . . , . ' T i : . i > . * 1 > ; ! 

‘777 ^^^Ap-AAaT-'tVl^pAi-'-frATTCTr-iCCCCGGCTGGTTTTGCGATTCTAAAATGTAATAATAAGAC 

Saso' ’ .6450 SSOO ' 6310 6320 6330 6940 


Ml I 

6350 



SLi'JC 

SSOO 

£310 

‘T 

4U 750 


760 

kTC: 

i ■ ■ • t : r ; : i : f 

1 

'.AA-fuTCAOiCACi 

■ • t ! 1 1 1 ) 1 I t t 

i ! \ 

£Ti~: 

; - • 1 l t 1 ■ i ' : 1 

'xTP:ct- 

| ; t t t 1 I 1 t 1 T I 

-lAA': 6TCAGCAC: 


(.fccv- 1 bb 


6380 


, , , i t i t t t i i i i . « i i i . * * ' ' * 1 ' 1 1 1 * * ' ' * ' 

5TACAATGT ACACATGGAATT AGGCCAGTAGTATC 
6330 7000 7010 7020 

-> 850 860 870 


itiitiiiiti.i. 

titltitiiii.il 


o K; V*> 330 840 850 860 U fU 

A AG i r'V T'~ — G‘i -'' • • 'Gr;Ai VTCT AGG AG A AGAAGAGGT AGT AATT AGATCTGCCAATTTCACAGACAA 

77 ,'! . : V V A i • ;Ti': : : : i ; ; • :; : : i : ::: ; : ; :: : : : : : :: : : : : :: : ::::: : : : '■ : '•: 

AAi', ,'rtor I- v,- I i5-~ ; oa-i-5-v ; ff.T Gl rCAGAAGA AGAGGT AGT AATT AG ATCTGCC AATTTCACAGACAA 
70^' 7040.7050 7060 7070 7080 7090 

o.-.i, :yr,o gio 320 930 940 

rrri rApoi' -;V 1 5 1 «f'i ;pr-l^ ii ;Ppr.CAATGTSl A5 A A ATT A ATTGTACAAGACCCAACAAC A AT ACA AG 

i t .* •> 1 -* r f 1 ■ ■* ■ ! t , , t ” t . . i t . . * . . t t . i i * . < * . * > * * * * * * 1 1 * 1 1 ' ' I ! ! ! ! I ! . I 

TRG-f AAAfV : 55 I .Aj-iTfV TTACAGCl GAAUCAfii CT6fi AGAAATTAATTGTACAAGACCCAACAACAATACAAG 
•/i 0C 71 .0 7120 7130 7140 7150 7160 

co j.-) k-*,- 530 330 1000 1010 

r • - A V n• VrKPG^ARGRAGAGGATTTGTTACAATAGGAAAAATAGGAAATATGAGACA 

' ‘ • ! ' * f 1 1 ^ p ’ 7 ; , f f , t . , , t , , , , f . , , , , , , . ... | ; j ; 

V. <U-V - . ■ ^7^i-i r ’-M : 'GG'-'i-:A5A.i-- : Af r -'C’ATT'iGTTACAATAGGAAAAATAGGAAATATGAGACA 

7 y7o. 7 'V“ " 7 5-:0 7200 7210 7220 7230 








1020 1040 1050 1060 1070 1080 1090 

ftGOOi i,; V . .‘i-.-i 7AOYAGACCrAA:7fGCAATGCCACTTTAAAACAGATAGCTAGCAAATTAAGAGAACA 

. , . ■ , , t I l I ■ • ) - i < ! t i III | ( | | t t t I I I t I I I i I I | ( I I ( I I I I I I I I I t I I 1 

t ; t t ; » » * i i ■ i » * * * * » * • * • • ■ > • • • 1 1 * 1 * 1 1 1 * 1 * 1 * 1 * 

AGCACAT T: I f:•.CATTA 9 TASAGCAAAA' fSGAATAAGACTTT AAA AC AGAT AGAT AGCAAATT AAGAGAACA 
7240 '77.: 50 72BC 7270 7280 7290 7300 

J j. < <o 1111' 1 120 3.130 1140 1150 1160 

ATTTOEA.-..'--'* fAAAOCAATAAT;jTT*|-AAGCAATCCTCAGeAGGGGACCCAGAAATTGTAACeCACAGTTT 


i * i » i t i : i 


i i i t i i i i i i i t i 


ATTTRGAA, tTAA FAAAriCAATAATCTTTAAGCAGTCCTCAGGAGGGGACCCAGAAATTGTAACGCACAGTTT 
7310 • 7330 7340 7350 7360 7370 7380 

1 1 VO 1170 >130 1200 1210 1220 1230 

fAft',; ijiGi-ii-'.v i: rrn >vrr autgtaati'caacacaactgtttaatagtacttggtttaatagtacttg 

. , , • , t , , j , t , t , . , * , t I • , I , t . T . 1 « t I I l . . . I * I * I t I > I * » *.,) I I I .( I I I I I I I I I I I 

( i j • * i j t ; i i : t : ; t ; t i i : i ) i ' i t t i I i i i i l i i i ( » < * < * * * 1 * 1 ’ 1 1 1 1 1 * 1 1 1 1 * 1 1 1 1 1 1 1 1 1 1 * 

TAATTGTGGA;.;:C6GAA i TYTTCTAC i‘STAATTCAACACAACTGTTTAATAGTACTTGGTTTAATAGTACTTG 
7370 7400 74 5.0 7420 7430 7440 7450 

1240 125C 1720 1270 1280 1290 1300 

(SAG*! AC^GAAS^GTCAAATAACACTGAAGGAAGTGACACAATCACACTCCCATGCAGAATAAAACAATTTAT 

l * ■ f t l ■ : : : , ; 1 - r : : , I * * 1 • ) t ! : i I l f t t l i i i i t t i i I till!.I I ( I I I I t I ill! 

, j J , ; ! ! , . i , , t f . , t t ( . , ; t * t l I I I ! I J I I I t 1 I t t » | | I I I 1 I I I I I I I I I I t I ! till 

GAGTACTAAAUGGTCAAATAACACTGAAGGAAGTGACACAATCACCCTCCCATGCWGAATAAAACAAATTAT 
7460 7470 74S0 7490 7500 7510 7520 

1310 1070 i330 5340 1350 1360 1370 

AA.ACATGTGG' ’AGfBAAi "Tf AGGAAAAG.CAMT GTATGCUGC IGCCA ICAGCGGACAAATTAGATGTTCATCAAA 

; , , , , . t : : i . . : , i , , , j : , t t , t , , , i i : r i , t , » t i t i i l l i ( i i i i i i i i l t t l ! I l » l i i i i 


AAACATGTGtaCAGGAAci fASGAAAAQCAATGTATGCCCCTCCCATCAGTGGACAAATTAGATGTTCATCAAA 

7530 7540 7550 75SC 7570 7580 7590 


1380 1 35 

TAT ' r ACAGi7.&:( 


v.ji.i 1400 1410 1420 1430 1440 1450 

.KC‘: i rCTATTA ACAAC-:ACr:ATGGT6.GTAATAACAACAATGG6TCCGAGATCTTCAGACCTGGAGG 


TATTACAt-iiiGC-TiSCTA rfAMCAAUAGATGETGGTAATAGiCAACAATGAGTCCGAGATCTTCAGACCTGGAGG 
7600 7610 762 ? 7630 7640 7650 7660 

1450 1470 1480 1490 1500 1510 1520 

AGGARATATGAGGGACAATTG6AGAAGT6AATT AT ATAAATAT AAAGT AGTAAAAATTGAACCATTAGGAGT 

t , . . , , , , ( ■ : { , J , , I , , , 1 , , , 1 : i ! I : t i I 1 t i : I > i I t t t i I t i I I i i i I t i i i t i I i < i < I i I 
, , ( , . , , , ; , , j , , ; . , , J , T [ , t , , ( | | 1 1 I 1 | | | ) l t I t I 1 I I I t I I I I I I I I I 1 I I • I I I * I 1 I • I I • 

AGGAi TATA i7“A!*TnGACAA1 I 'GGAGAAGTGAATTAT ATAAATATAAAGT AGT AAAAATTGAACCATT AGGAGT 
7670 73-30 7630 7700 7710 7720 7730 7740 


1530 1540 155C 1560 1570 1580 1590 

AeCACCCACCAAGECAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGTGGGAATAGGAGCTTTGTTCCT 


I 1 T | 1 1 * t l t I I t I I I I I I I I I I I • * • * * * I t t I I I I » t I I I I I I I I 1 I 

, ; . J I t : I I I I I I I * I I I » I I t t I I 1 I » I I I I I t I t I I t I I t t I » I t I l 


AECACCCAOCAAGSCAAAGAGAAGAETEGTGCAGAGAGAAAAAAGAGCAGTGGGAATAGGAGCTTTGTTCCT 
7750 7760 7770 7780 7790 7800 7810 


;COu ICO. 1330 1630 1640 1650 1660 

T6'~':: f t'CT 01 1 w-lAGCALlCAGGAACCACT ATGGGCGCACGGTCAATGACGCTGACGGTACAGGCCAGACAATT 


TCfC-:-.:TTC"l YCruiGA 
7G70 


iGCAhiCAGGAAoiJACTATGGGCGCAGCGTCAATGACGCTGACGGTACAGGCCAGACAATT 


7860 


7870 


7880 


1.S70 6B O 1690 1700 1710 1720 1730 

ATT GTCTT'.’-TfiYAGTGi'lAGCAECAGAACAATTTGCTGAGGGCTATTGAGGCGCAACAGCATCTGTTGCAACT 


t ■ i i i i I t > i i i i i i i 


AT'I IT i'CTO- i'AT AGTOTAGCASCAGAACAATTTSCTGASGGCTATTGAGGCGCAACAGCATCTGTTGCAACT 
7C--IO 7900 76.10 7920 7930 7940 7950 


1740 

CACAO' i 
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AACT CAACTGCTGT TGAATGLiCAG"! CTAGCAGAAGAAGAGGTAGTAATTAGATCTGCCAATTTCACAGACAA 

, i i i t i i i : i : i t i t i i i i • t i t i i i i t i i i i t i ( t i i i i i i ) i i i i t < » i i t ... t i t i i i i i 

I ( I I I t r i : ; : i I i t i t : i ; i i t t t i»rii<)iii**r»***<**i**ti*»t't*****tiit*itit 

AACTCAhCTGCTGTTAAATGGCAGTCTGGCAGAAGAAGAGGTAGTAATTAGATCTGCCAATTTCACAGACAA 
7030 7040 7050 7060 7070 7080 7090 

SCO 880 BOO 910 320 330 340 

TSCTA AA ACCA'TWTAGTACAGCTGAACCAATCTGTAGAAATTAATTGTACAAGACCCAACAACAATACAAG 

■ ; • ) i i i i i i ; * i ! : t i • ■ i t t i i t ■ i i t t i i t t i i i t i i ■ i t i i i i t i i i i i i i i » » i i t i • • i * » t » * * t 
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TGCI'AAAACCATAATAGTACAGCTGAACCAATCTGTAGAAATTAATTGTACAAGACCCAACAACAATACAAG 
7100 7110 7120 7130 7140 7150 7160 

350 360 S70 380 330 1000 1010 

AAAAAGTA 7 CCGfATCCAGAGGSGACCAGSGAGAGCATTTGTTACAATAGGAAAAATAGGAAATATGAGACA 


t i : t i t i j i t i i i i i 


AAAAASTA fCO-TATCCAGAGAGGACCAGGGAGAGCATTTGTTACAATAGGAAAAATAGGAAATATGAGACA 
7170 7160 7130 7200 7210 7220 7230 

1020 103 vi 1040 1050 1060 1070 1080 1030 

AGCACATTET AACATT AGTT AGAG.CAAAATGCAATGCCACTTT AAAACAGAT AGCT AGCAAATT AAGAGAACA 

, , , , , , 1 • t I I : ; I I J : > I t ; I I I I I t t I I lit , , ) ! I I ■ | I I I I t I I I I I I I I I I I I I I I I I I I I I I 
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AGCACAT'i GTAAOATTAGTAGAGC.’AAAATGGAAT AACACTTT AAAACAGATAGAT AGCAAATT AAGAGAACA 
7240 7250 '7760 7270 7280 7230 7300 

iloo 1110 1120 1130 1140 1150 1160 

ATTTGGAAATAATAAAACAATAA7CTTTAAGCAATCCTCAGGAGGGGACCCAGAAATTGTAACGCACAGTTT 

, i | t I ) : I r t i i i t i i i i i I i ! : i i i : t i t i i I t r i ! I I t t l t i I i I i l i l i * i t i I i I I I I * t I I I i i l i 

, , , , t , | t - i * . i t i i i t t f 1 I : i : I • ; : : i ! I iiitfiltlitlltllllllllltlllltlllttttli 

ATTTGGAAATAATAAAACAATAATCTTTAAGCAGTCCTCAGGAGGGGACCCAGAAATTGTAACGCACAGTTT 
7310 7320 7330 7340 7350 7360 7370 7380 

1170 1180 1190 1200 1210 1220 1230 

TAAf f GTGJsAGQaGA ATTTTTCTACTGTAATTCAACACAACTGTTTAATAGTACTTGGTTTAATASTACTTG 

, , , . , t t t t t t r t l I t i l t l t t i t i I i i t t t : I t l t t I t I I I t I » • • I I t t > t * * * * * * * * * • 1 • 1 < I 1 1 * 1 
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TAAI f'GTGGAGGGGAA' rTTTTCTACTGTAATTCAACACAACTGTTTAATAGTACTTGGTTTAATAGTACTTG 
7330 7'! 00 7410 7420 7430 7440 7450 

1240 1250 1260 1270 1280 1230 1300 

GAGTACTEAAGCGTCAAATAACACTGAAGGAAGTGACACAATCACACTCCCATGCAGAATAAAACAATTTAT 

t t.l i l t I t i I I i i : I t t ( t t I i t t ■ ■ ■ I t ■ t < f t I I I ) t I 1 i i i i i i i i i i i 1 I l I I 1 I I ! I I till 
Ill’ll! i i : : i t t i i i t j t • t i i j t t i t i l l l t l l i i i i i l i i i i I I I i i I I i t i I i I i t t i i i till 

GAGTACTAAAGGGTCAAATAACACTGAAGGAAGTGACACAATCACCCTCCCATGCAGAATAAAACAAATTAT 
7430 7470 7480 7430 7500 7510 7520 

1310 1320 1330 1340 1350 1360 1370 

AAACATGTGECAGGAAGTAGGAAAAGCAATGTATGCCCCTCCCATCAGCGGACAAATTAGATGTTCATCAAA 

I I , g , 1 I | | I , ! I 1 I | | | | | t | | | | | I I • t I ! ( I I I I I ! I < I I < I I I I I 1 I 1 I I I t I I I I t I I 1 I t I I ) I I 
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AAACATGTGGCOGGA'-'.GTAGGAAAAGCAATGTATGCCCCTCCCATCAGTGGACAAATTAGATGTTCATCAAA 
7550 7540 7550 7560 7570 7580 7530 

1380 1330 1400 1410 1420 1430 1440 1450 

TATTACAEGGCTuCTATTAAOAAGAGATGGTGGTAATAACAACAATGGGTCCGAGATCTTCAGACCTGGAGG 


TATTACAGl^aC'i'usCTA iTAACAP.GAGA'f GGTGGTAATAGCAACAATGAGTCCGAGATCTTCAGACCTGGAGG 
7600 760.0 75 20 7630 7640 7650 7660 


1 .an 


1 


1 AMO 


1 AQQ 


iVSfKflu 





AGGAG AT A'T GAl- ;GGAf ■( 'i ATTI ,':b AGAAETGAATT AT AT AAAT AT AAAGT AGT AAA A ATTGAACpC ATT AQG AST 

AGGAGATft'l'GAGGGACAATTGGAGAAGTGAAT'rATATAAATATAAAGTAGTAAAAATTGA^CATTAGGAGT 

767U ‘ " 7630 7690 7700 7710 7720 7730 7740 

1SSO i ‘540 1550 1560 1570 1580 1590 

Arr-'M :: TCA'-'CA^ f -Gr 'AAGAGRARRGTGGTGCAGAGAGAAAAAAGAGCAGTGGGAATAGGAGCTTTGTTCCT 

, , . . . : . ; : • ; : : : : : : : : : : . 


AGCACCCACCA^V-lGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGTGGGAATAGGAGCTTTGTTCCT 

• r. m ieU; 1620 1630 1640 1650 1660 

TGGGTTCTTEGr;:AGCriGCAGGAAiT.CACTATGGGCGCACGGTCAATGACGCTGACGGTACAGGCCAGACAATT 
TGGG'I'TCT rGGGAGCAGCAGGAAGiCAC'i'ATGGGCGCAGCGTCAATGACGCTGACGGTACAGGCCAGACAATT 

7620 7850 7340 7850 7860 7870 7880 


1 c7« ► ; p;GO i 030 170O i nu a <^ * ’ wv 

attgtctggtatagtgcagcagcagaacaatttgctgagggctattgaggcgcaacagcatctgttgcaact 

;; i : ; i:::: i i . .. 

Attt 1 ■frTFGTAT Ai-,T0CAGCAGr.A6AACAATTTGCTGAGGGCTATTGAGGCGCAACAGCATCTGTTGCAACT 

' ' GoO 7310 7G20 7930 7940 7950 

1 74() t7V , 1730 1770 1780 1790 1800 1810 

CACAGTCT!7-i? : CATCAAGC:AGCTCCAGGCAAGAATCCTGGCTGTGGAAAGATACCTAAAGGATCAACAGCT 
. ... > 1 1 1 1 !! !‘. !!!!!!!!!!!!!!!!■ i i • 1 1 1 1 1 1 1 1 1 1 


1710 


1720 


1730 


CACeVaTCTbO 5i’" > -ATCAAGCAGCTCCAGGCAAGAATCCTGGCTGTGGAAAGATACCTAAAGGATCAACAGCT 

7960 '*vf,70 7380 7390 8000 8010 8020 


i >r> „>Q 1830 1840 1850 1860 1870 1880 

CCT>" i’i'lGA '"T'i“Cl-lGGrTGCTCTGGlAAAACTCATTTGCACCACTGCTGTGCCTTGGAATGCTAGTTGGAGTAA 

. . : ! 1 1 ; ; ; I ; ; ; ; ; ; ; ; ! 1 ! ! I ! ! I ! ! 1 1 ! I ! ! I.• *. . 


CCTOil'-aGGA ITTG-.GGGT yGCTC TGGAAAACTCATTTGCACCACTGCTGTGCCTTGGAATGCTAGTTGGAGTAA 

8030 8040 9050 ROSO 8070 8080 8090 8100 

icco 1800 1910 1920 1930 1940 1950 

TAAATCTG fEGAACAGATTTGEARTAACATGACCTGGATGGAGTGGGACAGAGAAATTAACAATTACACAAG 

-11 i-rcTf: i (U -lACAG atttggaataacatgacctggatggagtgggacagagaaatt AACAATTACACAAG 
. QUO " 8120 8130 8140 8150 8160 8170 


ur-qo 1370 1590 1390 

CTTAATACT.TTCC'rTAGTfTG.AAGAP.TCGCAAAACCAGCAAGAAAAGAATGAACAAGAATTATTGGAATTAGA 

* i * ; j ; | ; ! | j ; ; \ ; \ ; j [ ; J J | 1 I \ 1 ! ! ! ! ! ! ! 1 ! 1 * • * • « * < * * 1 1 * 1 * ' ‘ ' 1 ' ' 1 * ' ' ' 

f'TTiNATAf'^CTCCTTAATTGAAGAATCGCftAAACCAGCAAGAAAAGAATGAACAAGAATTATTGGAATTAGA 

:;-130" 8190 8200 8210 8220 8230 8240 

gn , r ™.*io -2050 2060 2070 2080 2090 

TAAATGGGCAAG:TTTGTGGAA"fTGGTTTAACATAACAAATTGGCTGTGGTATATAAAAATATTCATAATGiAT 
.. i i i i i i i i i t i t • 1 » 1 1 1 1 1 1 1 ’ 11 1 *!!!!’. i i i i i i t i i i • t » ' * • 


2000 


2010 


2020 


:::: i::::::::::::::::: 


Tftc,A nTGf- iV'AC'TTTr'TGGflATTGl-STTTAACATAACAAATTGGCTGTGGTATATAAAATTATTCATAATGAT 

g 2 L; ( ‘; . o ; > £:0 og',0 8280 8290 8300 8310 

omn 'jnu 2icso 2130 2140 2150 2160 2170 

2 j£ Tp . mA ^T- GGTAG6T7 rAATAATAGTTTTTGCTGTACTTTCTATAGTGAATAGAGTTAGGCAGGGATA 

ARTAGGAGG«TTTGGTAGGTTTAAGAATAGTTTTTGCTGTACTT1CTGTAGTGAATAGAGTTAGGCAGGGATA 

8320 8320 6340 3350 8360 8370 8380 

i. >f j o i C|Q £900 2210 2220 2230 2240 

TTCPr.GATTATrCTrTiCAGACCCAtX-rCCCAACCCCGAGGGGACCCGACAGGCCCGAAGGAATAGAAGAAGA 

i :.. 

TTf'Ar'CA-n ATl:g 7T iTJAGACr.CY-’.CCTCCCAATCCCGAGGGGACCCGACAGGCCCGAAGGAATAGAAGAAGA 

8390 " ' £494 8410 8420 8430 8440 8450 8460 


"i r/T 





AGGTGGAGAGArAGAuAGASACASATCCATTCGATTAGTGAACGGATCCTTAGCACTTATCTGGGACGATCT 

., . . . , . . . , , . .... i ... i ■ ■ i i t i i i t i ■ l I i I I i i t t i I I I i I I I • t i i 


I.. I i i I t i i I i i i t i i 
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AGGTGGAGAGAGAGAUAGAGACAeATCCATTCGATTAGTGAACGGATCCTTAGCACTTATCTGGGACGATCT 

- - ~ 8530 


3470 


8480 


Q4S0 


8510 


8520 


2330 2340 -2350 2360 2370 2380 

GCGG AC-iCC’.TTO;'; GCCTCTTCAGCT ACCACCGCTTGAGAGACTTACTCTTGATTGTAACGAGGATTGTGGAAC 

. . . . * . . . . . , # f t I I l 1 ■ t t ! > t 1 t I I t t I I I » t I I I I » * I » * * * * • • • » 1 1 • * 
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GCGCiAGCC--TGTGCCTCTTCASC:'fACCACCGCTTGftGAGACTTACTCTTGATTGTAACGAGGATTGTGGAAC 

*_ j— ir- r\r~*r'\.r\ 


8540 


8560 


S570 


8590 


8600 


2350 2400 2410 

TTC'1'GGGACGCAGGGGt-.iTGG.GAAiiiCCC 


2420 


2430 


2440 


2450 


1 ! i i i t i i 


TTCT GGG A8GC AtTiGGuGTGGG A AGCCCTCAAAT ATTGGTGGAATCTCCTAC AGT ATTGG AGTCAGGAGCT A A 
SG10 GS.20 8630 8640 8650 8660 8670 


2460 

AG 


3. KUNZ~.i58~CL39, SFSJ 

HIVHAI..CG Human 1 ymphadenapathy virus (MAL isolate)» comp let 

ID H> VMAL.CG standard? RNA » 9229 BP. 

XX 

AC XC4415.. 

XX 

DT 17 -GCT-ISSG < incorporated) 

XX 

DE Human i y mplvsdenopathy virus (MAL isolate)# complete genome. 

XX 

KW acquired :v.rmun© deficiency syndromes eriv gene; gag gene; genome; 

KW Iona terminal repeat; pol gene; polyprotein; provirus; 

KW r s'verse transcr i ptass. 

XX 

□S Human ).y niphsdenopathy virus 

□C Viridee» ss-RNA enveloped viruses; Retrov 1 ridae. 

XX 

RN [ 1 J < bf?.s©~. 1—9229) 

RA Ai izon H, , Wa i n-Hobson S. * Montagnier L. » Son i go P. ; 

RT ' "Genetic variabi 1 ity of the AIDS virus* Nucleotide sequence 
RT anPlysAS or two isolates -from African patients"; 

RL Ca11 46 = 6: V -74( 1986) . 

XX 

CC Acr;U) rod immune deficiency syndrome (AIDS) is caused by a 
CC retrov.'.run Known by several different names# probably representing 
CC two separate strains* human T-cell lymphotropic virus-III 
CC ( I-ITLV-h: I !) and 1ymphadsnopathy-associated virus (LAV) are thought 
CC to be one stratm and AIDS-associated retrovirus type 2 (ARV-2) the 
CC Oliver , r-n : : three viruses# whose sequences do not differ by more 
CC than ©bout 6%•. are be) ieved to belong to the retroviral subfamily 

CC Lent i v i v' # c *ae ■ or " yiow :i viruses. For the details of the annotation 

CC arid for other pertinent references# see the HIV reference entry. 

XX 


FH 

FH 

Kr y 

F rc/iTi 

To 

Description 

FT 

P; :, T 

l 

/};/ 

R repeat 5’ copy 

FT 

r:-t 

l 

1 77 

5 ; LTR 

FT 

s v re 

1 7S 

ITT, 

primer (Lys-tRNA) binding site 

FT 

cr,:-; 

1S50 

1 oo / 

gag polyprotein 

FT 

FT 

CLS 

1 dL-3 

4c*Y 

po1 po1yprotein (NH2-terminus 


FT 

* V * 1 

i; o 

481 o 

v'i 

FT 

a: 

\ 

L 3 >. 


FT 

Cli 

V S 

54015 

5f 

FT 





FT 

~;n 

"5 


s; 

FT 

t 

s 

7953 

8< 

FT 

cr- 

B 

33GO 

SH 

FT 

F:P 

• r 

SG70 

9; 

FT 

RF 

r 

91 5 4 

9; 

XX 





SQ 

Se 

'C'UBil 

ue G22S I 

3P i ; 

Ini ti 

Hi 

Beer 

3 

3 IB 

Res i d 

ue 

I den 

trty 

84% 

Gaps 





/ s 



'j 

20 


snr 23K protein 
urfC 

tat protein 5 exon 2 (-first- expressed 
exon) 

envelope polyprotein precursor 
tat proteins exon 3 (AA at 7960) 

27K protein 
3 5 LTR 

R repeat 3’ copy 

1627 C; 2204 G! 2043 T; 0 other; 


316 Optimized Score 


2060 Significance = 0.00 

2085 Mismatches = 350 


AA&AGGAGAAGACAGTGGCAATGAGAGTGAAGGAGAAATATCAGCACTTGTGGAGATGGGGGTGGAAATGGG 

t!i'i:tiritt t , , : f : i . i : : t i i t » ^ » * 1 ! • ' ' « ' ' ' ' ' ! ' !!!!!! ! ! ! ! ! 

, , • i : 1 t t i * i < i i i t i i i ; i i i t i t t t t t i i i i i i * ' ' ' * i i i i i i i i i i i 

AAGAGCAbirtAGATAG'l GGCAATGAGAGTGAGGGAGATACA-GAGGAATTATCAAAA—CTGGTGGAGATGGG 
5780 3790 5800 5810 5820 5830 5840 

SO 80 100 HO 120 130 140 

GCACOATGCTCC rTGGGATATTGATGATCTGTAGTGCTACAGAAAAATTGTGGGTCACAGTCTATTATGGGG 

: i ; ::::::: : : : : : : : :::::::: : : : : : :::::::::: 

GCA' i 'GATGCTCC: fTGL-.GATGTTGATGACCTGTAGTATTGCAGAAGATTTGTGGGTTACAGTTTATTATGGGG 
5850 5660 5870 5880 58S0 5900 5910 

150 160 170 180 190 200 210 

TACCTGTGTG3AABGAAGOAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACAGAGGTAC 

I . I t I t I , • I 1 : i I I I I I I ; t ; i I : ; : I I I I ! I t I I I I * i t I I t I I I t I I I I ) I I i 1 I I I I I I (111 

, , t * I I » : i i : < : i i i : i t t : t i i i I t i i i i i I i i i ( > t i i * • • < * * ' <. I i t 1 1 ' 1 ' 1 ' * ' 

TACCfGTG rGGAAAGAAGCAACCACTACTCTATTT7GTGCATCAGATGCTAAATCATATGAAACAGAAGTAC 
5920 5530 5940 5350 5960 5970 5980 5990 


220 230 240 250 260 270 280 

ATAA fGTTTGGCvCCAC:ACATGCCTRTGTACCCACAGACCCCAACCCACAAGAAGTAGTATTGGTAAATGTGA 

1,11 i t ? ! ; t t [ t ! 1 i t I t 1 i I t » i t I 1 t I ,, I , I I I , I i i i 1 i t I i » III t ill I i I l l l l 

, , i , i , » t * i i i t t i l t i > i i t l t t i i i ♦ t l l t ... l l I t 111 l III I I I l l l i 

ATAA(3ATCTGGGCTACACATGCCTGTGTACCCACGGACCCCAACCCACAA6AAATAGAACTGGAAAATGTCA 
6000 6010 6020 6030 6040 6050 6060 

230 300 310 320 330 340 350 360 

CAGAAfiATTTTAACATGTGGAAAAATGACATGGTAGAACAGATGCATGAGGATATAATCAGTTTATGGGATC 


CAGA AGGG'iTTAACA1GTGGAAAAATAACATGGTGGAGCAGATGCATGAGGATATAATCAGTTTATGGGATC 
5070 6080 6090 6100 6110 6120 6130 

370 380 390 400 410 420 

AAAMCCT AAA6CCAT6.TGT AAAATT AACCCCACTCTGTGTT AGTTT AAAGTGCACTGATTTG-GGGAATG 

r , J I I T : I I t t :: I i t i i i i i i i I I I i i « » J j j j J J j j j j ] j J J lit! 11 

AAAC.CCTAf *AACCA7 GTGT AAAECT AACCCCACTCTGTGTCACTTT AAACTGCACT AATGTGAATGGGACTG 
6:40 Cl50 61GO 6170 6180 6190 6200 

430 440 450 460 470 480 490 

CTACTAAT-ACCAATACTAGTAATACCAATAGTAGTAGCGGGGAAATGATGATGGAGAAAGGAGAGATAA 

ii iti * i t : i !■ i i t t » t . t i ii i iii lit i i i i i i ! ! ! ! ! I I 

, , t t t , I til It I I ! I t 1 t I I II I III III t 1 I 1 I I 1 I l l 1 I I 

CTGTGAATGGGACTAATGCTGGGAGT-AATAGGACTAATGCAGAATTGAAAATGGAAATTGGAGAAGTGA 

6210 6280 6230 6240 6250 6260 6270 

500 510 520 530 540 550 560 570 

flAAACTGCTCTTTCAATATCAGCACAACNATAAGAGGTAAGGTGCAGAAAGAATATGCATTTTTTTATAAAC 
;;;;; ;;;; ;; ;;;;;;; ; ; ; ; j ; ; ; ; ; ; ; : : I 1 I I I I ! ! 1 ! I !!!!!!! ! 1 

AAAf !T; GCTCTTTC AATATAA0CCCAGTAGGAAGT6ATAAAAGGC-AAGAAT ATGCAACTTTTT AT AACC 

6260 6280 6300 63.10 6320 6330 6340 

560 550 600 610 620 630 

TTfVTI •ATAf'.rpCfTAA’i 'A-44TCAI(3&r^rmrrf prWOrnTTria raAgTTCTflflrflrrTrflCTraTTfl 








TTGATCTARTA8AAA7 AGATGATAGTGATAATAGTAGTTATAGGCTAATAAATTGTAATACCTCAGTAATTA 
G3UO ;,3S0 6370 6380 6390 6400 6410 

640 AF;o 670 680 690 700 710 

UARARGCl: fGTr:OAAAi>:GTATCm TTGAGCCAATTCCCATACATTATTGTGCCCCGGCTGGTTTTGCGATTC 

. . • i i i i • < i i t I i i i i i i • i t t t i i i ■ i ■ i » i t ■ > i i i i 


; t | t I I l . t I I I I 1 I l * I * I < ' ' ' 1 ' 1 * 

lit, t i i t t t : t t i i t » i i i 1 1 1 1 * 1 ' ' 


CAu-.GGCTTGTCUAAAGSTAACC'f TTGATCCAATTCCCATACATTATTGTGCCCCAGCTGGTTTTGCAATTC 
6420 £430 6440 6450 6460 6470 6480 


720 730 740 750 760 770 780 

•ppjPjj^TGTAATAATAAr^CGTTCAATGGAACAGGACCATGTACAAATGTCAGCACAGTACAATGTACACATG 

. , . , . , . , , i t i i t i t ...iri.it i t t i i i i • t i t i i i i i * * • 

; ; ; ; : : ; : : ; ; i : . . . « . -• . < - * * .. t ......... . 

T A A' M TfGT A ATG AT A AG A A6iTTCAATGG A ACGG A AAT AT6T AAA A ATGTCAGT ACAGT ACAATGT ACACATG 
6490 ’ 6300 651.0 G520 6530 6540 6550 6560 

750 800 CIO 320 830 840 850 

pqq-i fe,r-L- r HO/vvTA&' ATCAArTt .'AAC v GCT6:TTGAATGGCAGTCTAGCAGAAGAAGA6GTAGTAATTAGAT 

j ; V' : ] V! ; ; i : ! ; j ; i ; ; ; ;; ; ; : ; ; : im : : : : ::::::: 

QP'AT rAAOr infT.TG'RTGTCAAC.TCAACTGCTGTTAAATGGCAGTCTAGCAGAAGAAGAGATAATGATTAGAT 

' 6570 6930 6590 6600 6610 6620 6630 


860 070 880 890 900 910 920 

CTTCJCAAT “;'CA(JA.GACAATGCTAAAACCATAATAGTACAGCTGAACCAATCTGTAGAAATTAATTGTACAA 

1 ' * , | , , , , , , , , , , , i .. iiii.iiiiiiiii 


t * t : t < .. 

; l : i t i i I l t l i l t . 


CTGAAAATCTrACAGACAATACTAAAAACATAATAGTACAGCTTAATGAAACTGTAACAATTAATTGTACAA 

6640 6S5<) 6660 6670 6680 6690 6700 


830 rao S50 960 970 980 990 

^-v :aacaatp,; ;aagaaaaagtatccgtatccagaggggaccagggagagcatttgttacaataggaa 

T 7'..;; •; ;i; 11:: : ; ;! 5 ; :::::::: :::: : ::::::::: 

1956'" i'■ 'i'-T-'iT:-T:AAGAAGAC-:R;i?ATACATTTC-GGCCCAGGGCAAGCACTCTATACAACAGGGA 

67 lu 6720 6730 6'7’40 6750 6760 6770 

lOOO KUO 1 020 1030 1040 1050 1060 1070 

ARGAAATATRAGACAASCACATTGTAACATTAGTAGAGCAAAATGCAATGCCACTTTAAAACAGATAG 

; ; ; : : ; ; ; ; [ ; ; ; : I ; ; ! ! ! i ! ! ! ! I I . . * . .... - < ... . ■ » ■ 

-rpr -; qp6Pi ^TpTp:qryAAi 'ri 'ilai ZPCi ATTGTACT ATT AATGAAACAGAATGGGAT AAAACTTT ACAACAGGT AG 
. S7jr ;0 ' ‘ 3790 6800 6810 6820 6830 6840 

io:.;o >30 uoo mo 1120 1130 1140 

CTAi CAAATTAA.6AGTACAATTTGGAAATAATAAAACAATAATCTTTAAGCAATCCTCAGGAGGGGACCCAG 

V t V , , , , 1 , , t 1 . . . .» 11.11. ..... ' ' 1 1 ’ ' ' ' ! ! ! ! I ! ! ! ! ! ! ! 

* t * 1 tt t \ it t t 1 ttiiii 1 1 1 1 t 

5 T r.-r i¥ qqrr.T;:-- lT r- ! rf TAACAAAT'lCAAAAATAATTTTTAATTCATCCTCAGGAGGGGACCCAG 

5050 6680 6870 6830 6890 6900 6910 

ii« 0 190 1170 1130 1190 1200 1210 

AfWWAACRi :ACA, o TTTAATTGTGGAGGGGAATTTTTCTACTGTAATTCAACACAACTGTTTAATAGTA 

: m : : : s :::::::: m ::: i 

A-OA I i ACAAUAOACAi.. PfTTAATi C'iTAGAGGGGAATTTTTCTACTGTAATACATCAAAACTGTTTAATAGTA 
J2.) P630 6340 6950 6960 6970 6980 


).p;p -‘TAO KKO 1250 1260 1270 1280 

CJ-YT..9r-|' ( q-Vi r.iKACTTG.GAG'TACTGAAGGGTCAAATAACACTGAAGGAAGTGACACAATCACACTCCCAT 
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;:, .i tggiG-«-,i C^GCTGCTTATCGCCATCT-TGCTTTTAAGTGTCTATGGGATCTATTGT 

£•740 £750 5760 5770 5780 5790 

BO SO 100 110 120 130 

T-CCATGCT. :CTTGGGATATTQATGATCTGTAG-TGCTACAGAAAAATTGTGGGTCACA 

, - - • , t , , . , .tiiit ► * * * i * • » ' ' 

, 1 ; r t ! I til 1 ( 11*1 I I 1 I 1 I It II 

GARTC7- 7 ! TATGGTGTACCAGCT-TGGAGGAATGCGACAATTCCCCTCTTCTGTGCA 
£310 £820 5830 5840 5850 5SG0 

J. £0 1 SO 170 ISO 190 

i'ACCTRl G't 'GO A AGGAAGCA—ACC—ACCACTCTATTTTG—TGCATCAGATGCTAAAGC 

, ■ I I ' ; i t ! I it I I r I I I • I III I I I I t I t 

; ; : . , * f ■ t < • i i i i i * * * * * » * 

:,:l;;.''.TA57-Ti~SG:3!AACAACTCAGTGCCTACCAGATAATGGTGATTATTCAGA—ATTGGCCC 
£i ; >!;50 5890 5S00 5910 5920 5930 


TT^-', reTYACA2AAft37rT7i4ATGCTTGGGAGA-AT-ACAGT- CACAGAAC - AGGCAATAGAGG 

5.540 ' 5350 5960 5970 5980 5990 

200 25') 300 310 320 330 

1 .-i f fui 1 i iV.~ uiAC:<AA 1 TTTAACA1GTGGAAAA—ATGA—CAT GGTAGAAC AGAT 


/'n"- , 1 :>■ ; '!, ! 1 1 J*. ’ I C4A TAAAGCCTTGTGTAAAATTAToCCCATTATGCATTACTATGAGAT 

''t.OOO J ' lioio.3020 6030 6040 6050 6060 

3*0 350 3S0 370 380 390 

l v, c , ATl . V : l -r:'v-; *vVOY:Ar>T-'TAT -AGPAT-CA—AAGCCTAAA-GCCATGTGTAAAATTAACCCC—AC-T 

T/V ; : : ; : : : : : : ::::::: 

GCAAVAAAf’ :o A,'i-,:b:03ATTSACAAAATCATCAACAACAATAACAACAGCAGCACCAACAT 

5070. GOSO 6030 6100 6110 6120 6130 

-11 ; \~r) 430 440 450 460 

rTA.SmF-; fA AAL-':TRi').-iCTRATTT GGGGAATGCT ACT AAT ACCAAT ACT AG—T AAT ACCAATAGTAGT 

; • j ; •; ■ ; ; ! ; : ; ’ \ ! ! ! ! ! ! ! ! ! ! Will 

CfiGCACCAG TATCA'i AAA. A A.' VrAG : ACATGGTCAATGAGACT AGTTCTTGT A—TAGCTCAGAATAATTGCACA 
0 1 4;_. si 50 6160 6170 6180 6190 6200 

-■70 a AO 490 500 510 520 530 

0611RGGA'V AYGA-•VGATGimASAAAGEAGAGATAAAAAACTGCTCTTTCAATATCAGCACAAGNATAAGAG 
: • : : ;: :; : ; : : : : : ; : : \:: 
GC.O'rf'GGAACAAGA^CAAATGATAAECTGTAAATTCACCATGACAGGGTTAAAAAGAG-ACAAGACAAAGGA 

92 U) s220 6230 6240 6250 6260 6270 

r -., i0 550 560 570 580 590 600 

—AATA'fGn—<Tl TT—TTTT AT AAACTTGAT AT AAT ACCAAT AGAT AATGAT ACT AC 

;.i; : : : , ;: : ; ::::::: : : 

6TACOATG5A4C.TGG/ACTCTACAGATTTGGTTTGTGAAC—AAGGGAATAGCACT—GATAATGAAAGCAG 
6280 6230 6300 6310 6320 6330 6340 

0ic (£2Q 6-30 640 650 660 670 

I'r-tz-;- • ■;0TT|~p.0f^AGTTGTAAr.ACCTCAG'TCATTACACAGGCCTGT - CCAAAGGTATCCTTTGAGCCAA 

7 T i : _ ::::::: : : : : : : : : : : 

ATGOTACA - —*1 t-SAATCACT 1 if r A AC ACTTCTGTT ATCCAAGAGTCTTGTGACAAACAT - TATTGGGATACTA 
6350 6330 ' 6370 6380 6390 6400 6410 

030 390 700 710 720 730 740 

T i'Ci :GAT AC A11 ATTG fGCCCCGGCTGGTTTTGCGATTCTAAAATGT AAT AAT A—AGACGTTCAATGGAACA 
j ; ! ; 1 ; I ! ! J I ! ! ...» 1 . 1 » , ,i , 1 . 1 • * * * * • 

Ti'Aiv ATTTAG31 ATTGTGCACCTCCASGTT ATGCTT TGCTT AGATGT AATGACACAAATTATTCAGGCTTTA 
3420 3430 6440 6450 6460 6470 6480 

750 760 770 780 790 800 

GkA l: —CA1GTAt; A AATGT-: 'AECACAGT ACAATGTACACATGGA—AT-TAGGC-CAGTAGTATCAACTCA 
! : 1 ! ! ! ! !!!!!!! ! \ I ! ! ! 1 I ! ! ! I ! ! • 

fAAATGT' 'CT AARGTBfT TBGTCTCTTC—ATG—CACAAGGATGATGGAGACACAG—ACT-TCTACTTG 
6490 6500 B510 6520 6530 6540 6550 

8)0 820 830 840 850 860 870 

■>. ^-'V07 fSAA '"O't.CAGTCTAGGAGAAGA - AGAGGTAGTAATTAGATCTGCCAATTTCACAGACAATGCTA 

" w ; ; I ! ! :7; : ! • ; ; ; : : : : : : : : : : : : : :::::: : : : : : - 

l ; .;TT*i G3CT fTAATGGAAC1 'AG AGCAG A A AAT AG A ACTT AT ATTT A-CTGGC—ATGGTAGGGATAA TA 

65.30 6570 65S0 6530 6600 6610 6620 

880 GSO 500 910 920 930 940 

AA'SCCATA/ l-TAG—TACAGCTGAACCA ATCTGT AGAAATT AATTGT ACAAGACCCAACAACAAT ACAAGAA 

, , ■ • ■ . . . . . ; : ; . . ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; j j ! : : : ! ! 

i: i ; ’S;',TAT A A" iTAilf ITT A A AT A AGT ATTATAATO’TAACAATGAAATGTAGAAGA - CCAGGA AAT AAG 
66 'O 6640 6650 6660 6670 6680 
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/>[ /--■'j-'i | yj- TA—CCA-S i'CACCATTATGTCTGGATTGGTTTTCCACTCACAACCACTCACTGATAGGCC 

' Gsllo 6700 3710 6720 6730 6740 6750 

< C : »0 1030 1040 1050 1060 1070 1080 

-'R5A--CAT f'GTAACAl I AG—TAGAGCAAAATGCAATGCCACTTTAAAACAGAT—AGCTAGCAAATT— 

1 ’ " . * i i ill 
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O—-V-VTiCAGGCA' 1 RGT-GTTGG.YTTGGAGGAAAATGG 

6760 6770 6780 


6790 


6800 
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iOSO 1100 1110 1120 1130 1140 

—PAGAGAACA-AT—TTGGAAATAATAAAAC—AAT—AATC—TTTAAGCAATCCT—CAGGAGGGGACC 


iiii i i i i i 


TCAAACATCCCAHGTATACYGGAACTAACAATACTGATAAAATCAATTTAACGGCTCCTGGAGGAGGAGATC 

6820 8830 GOTO 6850 68S0 6870 6880 6890 

1 i 1180 1170 1180 1190 1200 

CAGAAATTGT^Ai IRCAl -A—GT—TTTAATTGTRGAGGGGAATTTTTCTAC.TGTAATTCAACACAACTGTTTA 

; ;V: ;: :• :; ;; ; :; ; ; :: :::::: 

QTGAA - ii iTAi iCTT'JATKrGGACAAATTGCAGAGGAGAGTTCCTCTACTGTAA - AA—TGAATTGGTTT 

r-'O 6910 6920 6930 6940 6950 

1210 1220 1730 1240 1250 1260 1270 

ATAGTACT i'GGTTTA—ATAGTACTTGGAGTACTGAAGGGTCAAATAACACTGAAGGAAGTGACACAATCAC 

; : ' ; : : : ; : i : ; : ! : : : : : ; : : : : : : ! : ! : 

CTA- AATThRCTAGAGtSATAlviGGATGTAACTACCCAGAGGCCAAAGGA—AC-GGC AT AG A AGG A ATT AC 

6560 8370 6980 6990 7000 7010 7020 

128< > 1830 1300 1310 1320 1330 1340 

Ar~T T.CATRC AGAATA AAACAA7TT AT AA ACATGTGGCAGGA AGT AGGAAAAGCAATG—TATGCCCCTCCC 

’ : ; ; T' : ;;::: :: :: : *.: r.iw 

-!-TC ; iCCGTivTCA fATTAGACAAATAATCAACACTTGGCATAAAGTAGGCAAA—A ATGTTT ATTTGCCTCCA 
70-30 7040 7050 7060 7070 7080 7090 

1350 i360 1370 1330 1390 1400 1410 1420 

ATmGCGGACA^ATTAGATGTTCATCAAATATTACAGGGCTGC TATTAACAAGAGATGGTGGTAATAACAAC 

; : • : : :: : :: : i : : :::::: .. 

^ptvGAGGRAGACCTCACGTGTAACTCCACAGTGACCAGTC I CATAGCAAACATAGATTGGACTGATGGAAAC 
7100 7110 7120 7130 7140 7150 7160 

1430 1440 1450 1460 1470 1480 1490 

AATGGGTCCGAG.'i-ATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGA—GAAGTGAATTATATAAA 

. . . . , , , i . a • i i » i i i i i ii till t 

i i * * i i * 1 i t t I » i i i i i « > * * it iiii i 

dp_pot TTAGTATC-ACCATGA—GTGCAG—AGGTGGCAGAACTGTATCGATTAGAGTTGGGAGAT 

7170 7180 7190 7200 7210 7220 

1500 1510 1520 1530 1540 1550 

TATA A ART/ ‘C.\ A Al-'A A1 TGAACCATT AGGAGTAGCACCCACCAAGGCAAAGAG AAGAGTGGT GCA— 

;;;;;; ii;;; i : ! ! i ! J > • « ■ * « * * * • • * ■ ■ • • .* * 1 * 

TATA A ATT AGTAGAGATCACTCCGATT GGCTTGGCCCCCACAGATGTGAAGAGGT ACACT ACTGGTGGCACC 
7230 7240 7250 7260 7270 7280 7290 
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GAGAGAAAAAAGfTGCAGTGGFAATAGGAGCTT fGTTCCTTGGGTTCT—TGGGAGCAGCAGGAAGCACTATGG 
iiiij i * i , , ; i i ! i iii i i i t t t i i i i » ' i j j I I * I ! ; ' ! ! I 

T -M y AAA'ITA AAf- — AG—Gbii v'T CTT T RTGiCT ARGGTTCTTGGGTTTTCTCGCAACGGCAGGTTCTGCAATGG 
Y30Q ' 7310 7320 7330 7340 7350 7360 
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GCOCACGRTr.RA ■" Y-iCGCTG;7CGGTACAGGCCAGACAATTATTGTCTGGTATAGTGCAGCAGCAGAACAATT 

;; ; ; ; ; : ; i i :::::: : 

GCGCGGCGTCSTTCAGGCTGACCGiCTOAGTGCCGGAGTTTATTGGCTGGGATAGTGCAGCAACAGCAACAGC 
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TCTC-nrtACrV“TAnTeTOCCATRSCCAAATGCAAST-CTAACACCAGACTGGAACA-AT GA—TA 
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_r "tgrcAA-WV! i7R6Abr.GAA0.GGT TGACTTCTTGGAGGAAAATATAACA-GCCCTCCTAGAAGAGGCA 
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CftAATTCAACAAi 1 ' -,i AG: A AGAACATS7ATEA ATTAC A AAAGTTGAATAGCTGGG- ATGTGTTTGGCAATTGGTT 
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TBAC.C TT&T1TTCT Tbib: ATA AAGTATAT- ACAAT ATGGAATTT ATG-T AGTTGT AGGAGT AAT ACTGTTAAGA 
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C''T' 1 7 A FT Ft"'' -or-T/y^CT CATACCCAACAGRACCCGGCACTGCCAACCAGAGAAGGCAAAGAAGGAGACG 
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(Vrr.r. ,£GAAi rE CCRTGGGAACAGCTCCTGGCCTTGGCAGATAGAAT AT ATTCATTTC—CTGATCCG-CCAA 
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RA Dsuros’.ers Kt C. ? Ti dials P. * Sonigo P. S 
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SITE 
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FT 
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R gene product < AA 1-101) 

FT 

SITE 

5758 
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ORF * exon 1 ( tat) 

FT 

FT 
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tat gene product < AA 1-99) 
<9083 is 2nd base in codon) 

FT 

SI fE 
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□RF» exon 1 <art> 

FT 

FT 

CDS 
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art gene product <AA 1—23) 
<9033 is 1st base in codon) 

FT 

SITE 
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□RF < snv) 

FT 

FT 

TVS 
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3300 
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FT 
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□RF; exon 2 <art) 
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AAGAG-CiASAAGACftSTGGCAATGAGAGTGAAGGAGAAATATCAGCACTTGTGGAGATGGGGGTGGA 


ATGLGTTGTCTTGGAAATCAGCTGC—TTATCRCCATCTTG-CTTCTAAGTGTCTATG6-GAT-TTATTGTA 
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Si. GO 


Si 10 


6120 


6130 


6140 


6150 


70 80 30 100 110 120 130 

—AATGGGGCACCAT G.Cl CCTTGGGAT ATTGATGATCTGT AG-TGCTACAGAAAAATTGTGGGTCACAG 


TTCAATAT6TCA-CAGTCY-TTTATGGTGTACCAGCT-TGGAGGAATGCGACAATTCCCCTCTTCTGTGCAA 

R1 SO 31.70 6100 6130 6200 6210 6220 


140 ISO 160 170 180 130 200 

TCTATTATGG:LLTftCCTGTGTGGAAGGAAGCA--ACC-ACCACTCTATTTTG-TGCATCAGATGCTAAAGCA 


it t t 


CCAAGAATU- U :AT.-iCT—TGGGPAACftftCTCAGTGCCTACCAGATAATGATGATTATTCAGA—ATTGGCCCT 
* 6230 6240 6250 6260 6270 6280 6230 


210 220 230 240 250 

T—A1 3ATACAGAGGTACA7 AATGTTTGbi-GCCAC ACATGC—CTGTGT ACCCACAG—ACC 


YT;A I'GTTACAG AAAGCTTTGATGCTTGGGAGAAT ACAGTCACAGAACAGGCAAT AGAGGACGT ATGGCAACT 
6300 6710 S320 6330 6340 6350 6360 


260 270 280 230 300 310 320 

»;-CAACCCACAAEAAGTAGTATTG-GTAAArGTGACAGAAAATTTTAACA-TGTG-GAAAAATGACATGG 

i * t * * * * * » t * * i t t i i i i i t t [ ! ! ! 

i i t t t t t i 


C‘I TTGAGACCTCAATAA—AGCCTTGTGT AAAATT ATCCCCATT ATGC ATT ACT ATGA6ATGCAAT AAAAGT 
6370 6390 6390 6400 6410 6420 6430 


330 340 350 360 370 380 330 

TAGtAACAG/ATGCATGAGGAT ATA Pi fCAGTTT AT-GGGATCAA-AGCCT A AAGCCATGTGT AAAATT AACCCC 


iiii 


GAG—ACAGATAAATGGGGAT—TGACAAAATCATCAACAACAACAGCATCAA—CAACAACAACAACAACAGC 
6440 6450 6460 6470 6480 6430 6500 


400 410 420 430 440 450 460 

ACTOTGTG fTAGTTTAAAGTOCACTGATTTGGGGAATGCTACTAATACCAATACTAG-TAATACCAATAGTA 


AftAATCAB-TAGAGACAAGAG.AC —AT—A6TCAATGAGACTAGT—CCTTGTGTAGTTCATGATAATTGCA 

B510 G520 6530 6540 6550 6560 6570 


470 4S0 430 500 510 520 530 

GTA'-,.CGGGGAA-A fV-iA-TGATGGAGAAAGGAGAGAT AAAAAACTGCTCTTTCAATATCAGCACAAGNATAAG 


CAGGCTTGGAACAAGAGCCAATGATAAGCTGTAAATTCAACATGACAGGGTTAAAAAGAG—ACAAGAAAAAG 

nvo --sgg- 


6580 1-J5S0 SSOO GS10 6620 6630 6640 
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ASGTAAGGTGCAGAAAG—AATATGC—ATTT-TTTTATAAACTTGATATAATACCAATAGATAATGATACT 

. . . . . . . . . . . i t t t i t i i it t i i i t i i t t t 


1 l i i ill i it 

i i t t tit i t i 


.tit it t i i i t i t t t t 
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(sARTACAATCAAACTniGiTACTCTGCAGATTTG&TTTGTGAAC—AAGGG'AATAGCACT-GGTAATGAAAGT 
0650 6660 6670 6680 6690 6700 6710 


000 6 i O 620 630 640 650 660 670 

ACCAGCTA' fACS'fT GACAAG fTGTAACACCTCAGTCATTACACAGGCCTGT-CCAAAGGTATCCTTTGAGCC 


ii it 


it i i i i i t 


AGATGTTACA-T~AATCACTGTAATACTTCTGTTATCCAAGAGTGTTGTGACAAAGAT—TATTGGGATGC 

6720 6730 6740 6750 6760 6770 


680 690 700 710 720 730 740 

AATTCCCATACA -TTATTGT6CCCCGGCTGGTTTTGCGATTCTAAAATGTAATAATA-AGACGTTCAATGGA 


i t | ii t t i i i i i i t t i t tiit iii i t i i i i i i i i i i i i * i * 

TATT-AGAfGTAt i ATATTGTGCACCTCCAGGTTATGCTTTGCTTAGATGTAATGACACAAATTATTCAGGCT 

6780 S7S0 6800 6810 6820 6830 6840 


750 760 770 780 790 800 

ACAGGACCA—TGTACAAATGT—GAGCACAGTACAAT6TACACATG6A—AT—TAGGC-CAGTAGTATCAAC 


ttii 


l i tii iiii tti ii ii » tit i t ii 
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TTATGCCTAACTGTTC f AAGGT AGTGGTCTCT TC-ATG—CACAAGGATGATGGAGACACAG—ACT—TCTAC 
6850 6860 6870 6880 6890 6900 6910 


810 820 830 840 850 860 870 

TCAACTGCTGTT GAAT GGCAG'i" CT AGCAGAAGA—AGAGGT AGT AATT AGATCTGCCAATTTCACAGACAATG 


TTSGTTTCGGTTTAAT GGAACTAGAGCAGAAAATAGAACCTATATTTA-CTGGC—ATGGTAGAGATAA— 

6920 6930 6340 6950 6960 6970 6980 


300 830 SOO 910 920 930 940 

CTAAAACCAT AA-TAGTACAGCTGAACCAAT-CTGTAGAAATTAATTGTACAAGACCCAACAACAATACA 

. . . , . i . ■>■! ii . i iii t 


i i i lit it iiii iiii 

i it iii ii iiii iiii tit 


-TAGGACTATAATTAGTCTAAAT-AAGCATTATAATCTAACAATGAAATGTAGAAGA-CCAGGA—AAT—A 
S9S0 7000 7010 7020 7030 7040 


950 960 970 980 990 1000 1010 

AGAA A AAGTATC-CGTA'TCCAGAG&GGACC A—GGGAGAGCATT—TGTT ACAAT AGGAAAAAT AGGAAAT AT 


AG—ACAG'fTT -TA-CCA-GTCACCATTATGTCTGCATTGGTTTTCCACT—CACAACCAGTCAATGA 

7050 7060 7070 7080 7090 7100 


1020 1030 1040 1050 1060 1070 

GAGAC—AAGCA—CATTGT -AACATTAGTAGAGCAAAATGCAATGCCACTTTAAAACAGAT—AGCTAGCA 

. . . . . ... . . . . ■ i i i i i i it l iii it 


GAGGCCAA AGC .AGGCA'f GGTGTAGGTT—TGGAGGAAATTGGAAGGAGGCAATAAAAGAGGTGAAGC-AGAC 
7110 7120 7130 7140 7150 7160 7170 


1080 1090 iICO 1110 1120 1130 

AAT!-AAGAGAACA AT—TTGGAAATAATAAAAC—AAT—AATC—TTTAAGCAATCCT-CAGGAGG 


CATTGTCAAACATCC0A6GTATACTGGAACTAACAATACTGATAAAATCAATTTGACGGCTCCTAGAGGAGG 
7180 7190 7200 7210 7220 7230 7240 


1140 1150 1160 1170 1180 1190 1200 

GGACCCAGAAATT6TAACGCACA—GT—TTTAATTGTGGAGGGGAATTTTTCTACTGTAATTCAACACAACT 

. . . . . . . . • t > i i i > ■ i i i t i t it •> t 


iii i i i i i i • t t t it 

l ill i l i i t l I t t i 


AGATCCGGAA-GTTACCT TCATGTGEACAAATTGCAGAGGAGAGTTTCTCTACTGTAA-AA-TGAATT 

7250 7260 7270 7280 7290 7300 7310 


IP 10 1220 1230 1240 1250 1260 1270 

GTT i AATA6 TACi'TGGTTTA— ATAGTACTTGGAGTACTGAAGGGTCAAATAACACTGAAGGAAGTGACACA 


i * i i 


ii t ii i i it 

i tilt i it t t it 


GGTT TCTA -AAT'I GGGTAGAAGA!AGGAGTCTAACTACCCAGAAGCCAAAGGA—ACGGCATAAAAGG-A 

7320 7330 7340 7350 7360 7370 7380 


1 opo 


I 






1 700 












^TC:'- v .CPiCTCCGi ! *vru’Ci-M'.riAriTAP;PiftGAftTTTAlY^AAC^TGTGGCAGGAAGTAGGAAAAGCAATG TATGCCC 


, , , i , i >t i t t i t.iii ... > • 1 ! ! ! ! ! I 

||It t It Jill I ! I t I I I I I I t I II 


ATfftC-GTftCCArGiTCATATT^GftCAAATAATCAACACTTQeCATAAABTAGQCAftA—AATGTTT ATTTQC 
7.3S0 7400 7410 7420 7430 7440 


1350 1350 1370 1330 1330 1400 1410 

CTCCCATCAGCGR ACAAATT AGATGTTCATCAAATATT ACAGGGCTGCT ATT AACAAGAGATGGTGGT AAT A 

. . . . . . . . • ■ i i i ill 


: t ; t ! 


iii it tt i i i i i 

tit it ii t * 


CTCCAAGAGAGG5f',GACCTCACGTGTAACTCCACAGTGACCAGTCTCATAGCAAACATAAATTGGACTGATG 

7450 7460 7470 74S0 7490 7500 7510 7520 


1420 1450 1440 1450 1460 1470 1480 

PCAACAATGGGTCCGAG—ATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGA GAAGTGAATTAT 


t tii 


rjppiPncA-AACTAGT ATC-ACCATGA—GTGCAG —AGGTGGCAGAACTGT ATCGATTGGAATTGG 




1 


7540 


7550 


7560 


7570 


7580 


1490 1500 1510 1320 1530 1540 1550 

ATAAftTATAAAGTAGIAAAAATTGAACGATT AGBAGT AGCACCCACCAAGGCAAAGAG-AAGAGTGGT— 


GAGATTATAAA'f fAGTAGAAATCACTCCAATT GGCTTGGCCCCCACAAATGTGAAGAGGTACACTACTGGTG 
7530 7600 7610 7620 7630 7640 7650 


15SO 1570 1530 1530 1600 1610 1620 

fiCA- -GAGAEAAAAAAGAGCAGTGGGAATAGGAGCTTTGTTCCTTGGGTTCT-TGGGAGCAGCAGGAAGCAC 


i ii i 


t iii i i i t i i t i i i i i i i ' 1 l » i i 

i j i i t i i i i i i i t i i i i i ' i i i i i 


RCAfinTCAAGAAATAAAAG'- AG-GGGTCTTTGTGCTAGGGTTCTTGGGTTTTCTCGCAACGGCAGGTTCTGC 
7660 7670 76G0 7690 7700 7710 7720 


1630 1640 1650 1660 1670 1680 1630 

TATG GGCGCACGGTCAATGACGCTGACGGTACAGGCCAGACAATTATTGTCTGGTATAGTGCAGCAGCAGAA 

. . . , . , i i i i i i i i i i i i i i i i i i ill i 


till i i i i i i i i i i i ill 

i . i i i itit t i i i .. iii 


AATGQGCGCGGCGTCGTTGAr.CGTGACCGCTCf-'SGTCCCGGACTTTATTGGCTGGGATAGTGCAGCAACAGCA 

7730 7740 7750 7760 7770 7780 7790 


’700 1710 1720 1730 1740 1750 1760 

nAATTTGCTGAGi^GCTATTGAGGOGCAACAGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCA 


ACAGCTGTTGGACGTGGTCAAGAGACAACAAGAAT1GTTGCGACTGACCGTCTG6GGAACAAAGAACCTCCA 
7S00 7310 7820 7830 7840 7850 7860 


1770 1780 1790 1800 1810 1820 1830 

RGCAAGAATCCTGGCTGTGGAAAGATACCTAAAGGATCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAA 

. . . . , . . . . t < i i i ■ I i i i • 


GAr: 1AGGGTCHCTGCCATCGAGAAGTACTTAAAGGACCAGGCGCAGCTAAATGCTTGGGGATGTGCGTTTAG 
7670 7880 7830 7300 7310 7920 7930 


1340 1850 1060 1870 1880 1890 1900 

AGTCATTTGCACnACTGCTGVGCCTTGG-AATGCTAGTTGGAGTAATAAATCTCTGGAACAGATTTGGAA 


ACAAGTCTGTCACACTACTGTACCATGGCCAAATGCAAGT-CTAA-CACCAGA-TTGGAA 

7940 7950 7560 7370 7980 7990 


IS10 1920 1930 1340 1950 1360 1970 

TAACATGACCTGSATGGAGTGGGACAGAGAAATTAACAATTACACAAGCTTAATACA—TTCCTTAATTGAA 


CAA'l GAGACT1G3CAAGAGT GGGAGCGGAAgGTTGAC—TTCTTGGAGGCAAAT AT AACGGCCCTCCTAGAA 
8000 SO10 8020 8030 8040 8050 8060 


1350 1560 2000 2010 2020 2030 2040 

GAAinGCAAAA5CAGCAAGAAAAGAATGAACAAGAATTATTGGAATTAGATAAATGGGCAAGTTTGTGG—AA 
• j • j • : : : i : : • ::::::: : : : : : : ::::::::::::: 

GAB5GACAAA1 TOO ACAAGAGAAGAACA fSTA FGAATT AC AA AAGTTGAAT AGCTGGG—ATGTGTTTGGCAA 
8070 8080 3030 8 IOC 8110 8120 8130 






'Vr 


o n'-'d 


onQn, 








T7RGTTTAACATAACAAATT3(XTGTGGTATATA-AAAAT ATTCAT AATGAT AGT AGGAGGCTTGGT AG 

, , . . . , , , , , , , . . . . , i i ii * * < * ' * * * * ' ' 1 1 ! * ! ! ! ! 

. ; ; ; > . . t > *.»<**< • < * * * * 1 * • * • * * ’ * 1 ' 

TTGCiTTTGACCTTAGTTCTTGGAT AAAGTAT ATACAAT ATGGAATTT AT-AT AATTGT AGGAG-T AATAC 


8 l bO 


8150 


8190 


8200 


2120 8130 2140 2150 2160 2170 2180 

GTTTAAGAATAG TTT‘1 TGCTRT ACTTTCT AT AGTGAAT AGAGTT AGGCAGGGAT ATTCACCA TT ATCGT 

: : : : : :: : ::: ::: :: :: : 

TGTYAAGAATACIGAYCTATATAG f ACAAATGCTAGCTAGGTTAAGACAGGGGTATAGGCCAGTGTTCTC T 
S2J.O 8820 8230 8240 8250 8260 8270 


2190 

TTCAGACCCACC1 


2200 2210 2220 2230 

-CCCAACCCCGAGGGGACCCGAC—AGGCC—C-GAAGGAATAGAA 


TCr.CCACD Ti “.TT ATT TCCAGT AG ACGC AT ACCC AACAGGATCCGGCTCTGCCAACC A A AGAAGGCAAAA A A 
8230 8290 3300 8310 8320 8330 8340 

2240 2250 2360 2270 2280 2290 2300 

GAAGAAGGTGEAGAGAGAGACAGAiaACAGATC-CATTCGATT AGTGAACGGATCCTT AGCACTT ATCTG 

:; : : : : : : : : : : :: : : ::::::: 
GBAGACGG rGGAGGCAGCGGTGGCAACAGCTCCTGGCCTTGGCAGATAGAATATATTCATTTC—CTGATCCG 
8350 8360 8370 8380 8390 8400 8410 

231O 2320 2330 2340 2350 2360 2370 

GGACfiATC fRCRRAGC—CTTG TGCCTCTTCAGCT ACCACCGCTTGAGAGACTT ACTCTTGATTGT A—A 

: : : : ;: :: ::: :::::::: : : : 

-CCAACTGATACGCCTCTTSACTTGGCTATTCA6CAA-CTGCAGAACCTTGCTATCGAGAGCATA 

8420 S430 8440 8450 8460 8470 8480 


2380 2390 2400 2410 2420 2430 2440 

n£4r-'GAT T >T!'GBAACTTCTGGGftCGCAGGGGGTGGGAAGCCCT-CAAATATTGGTGGAATCTCCTACAGTAT 

11 * . . iii » i i i t i i » i i » 


til » l l i ill 


CCA- GATCC1CCAACCAATA fTCCAGAGGCTCTCTGCGACCCTACGGAGAATTCGAGAA-GTCCT—CAGGCT 
8430 8500 8510 8520 8530 8540 8550 


2450 X 

TGGAGTCA—GGAACTAAAG 

i t t i i i i t t t 

l ; t t l ■ • - : t 

TGAAC TGACC i ACC I ACAA 
3560 8370 


8. KUi32’-• 1 b£--CL3: : Y hEC; 

HIV2RL3DX Human imTiiur.ode-ficiency virus Type 2 ROD isolate RN 

ID HIV2R0DX standard* RNA? S671 BP. 

XX 

AC XG5291$ 

XX 

DT G4-JUN--1S87 (annotation) 

XX 

DE Human Immunodeficiency virus type 2 ROD isolate RNA genome 
DE <HIV-2) 

XX 

KW acquiree 1 ivr.rnuno de-Ficiency syndrome; art gene; env gene; -f gene; 
KW gag gene; pol gene; q gene; r gene; tat gene. 


OS H 1 -3iian immunodeficiency virus type 2 
OS ROD isolate 

OC ViviJeu; • •• ENA enveloped viruses; 


viruses; Retrovi r idae. 


RN r ) ;i < bases 1—3S71) 

RA Alison M, ; 

RT 5 

RL Sul.mj tied ( 33-1CN-13Y7) on tape to the EMBL Data Library by 

P, A1 , ,; aLt .. .. . ««~l rwoc IIA1 1"T2 *-* 



j. _ _ I Li ! 1 4 . w_- J ? 1 I IJ l i viJ >-• 


RL Petsteuri 25 rue du Dr Roux > 75724 Paris CEDEX 15. France. 

XX 

RN r '■•'>'] (baser 1.-9G71) 

RA Guy 3 der M. * Evuarman M. » Son i go P, . Clavel F. , Montagnier L. » 
RA A i i 7.00 M. 3 

RT “Genome or gar.?.ration and transactivation of the human 
RT i -.iimunodef 5. c i ency virus type 2“ t 

RL 2. ’tl’T S r’;2ci" E5!_i -o&Eli i.S*J ?). 

XX 


RN ; ! 

RA Clavel F. . Guyader M. » Guetard D. » Salle M. . Montagnier L. . 

RA Alison M, , 

RT "Molecular- cloning and polymorphism of the human immune deficiency 

RT virus type 2 "i 

RL i ia i'.u.re 324 - CS i —695< i 997). 


XX 

FH 

Key 

h ram 

Tn 

Description 

FH 

FT 

£; v fir 

1 

\ 73 

R region 

FT 

&yv 

1 

233 

LTR 

FT 

FT 

FT 

6' f G 


9S7i 

H.iV-2 RNA 

corresponding to integrated 
provira1 DNA 

FT 

SITE 

;i 74 

239 

U5 region 

FT 

3 TE 

30a 

320 

primer binding site 

FT 

CDS 


21 1 1 

gag protein 

FT 

CDS 

1603 

4936 

pal protein 

FT 

S T TE 

4o 7.3 

4928 

polypurine tract 2 

FT 

CDS 

4893 

L.r rr ■* -7 

q protein 

FT 

CDS 

5982 

csss 

r protein 

FT 

FT 

ci is 

5845 

6140 

tat protein part 1 

<6140 is 2nd base in codon) 

FT 

FT 

CDS 

so y \ 

G140 

art protein part 1 

<6140 is 1st base in codon) 

FT 

cns 

B;U47 

8720 

env protein 

FT 

FT 

CDS 

3307 

3400 

tat protein part 2 

<8307 is 3rd base in codon) 

FT 

FT 

cns 

S3 07 

8536 

art protein part 2 

<8307 is 2nd base in codon) 

FT 

cns 

8557 

T3324 

f protein 

FT 

C1! TE 

3385 

8933 

polypurine tract 1 

FT 

SITE 

3342 

3497 

U3 region 

FT 

RPT 

8942 

3671 

LTR 

FT 

PEH 

3323 

3339 

core enhancer sequence 

FT 

FRM 

340 5. 

9416 

core enhancer sequence 

FT 

SITE 

3420 

91427 

pot. SP1 factor binding site 

FT 

SITE 

3428 

9437 

pot. SP1 factor binding site 

FT 

Tj7 TE 

3433 

3448 

pot. SP1 factor binding site 

FT 

PRM 

3485 

B470 

TATA-box 

FT 

site 

3498 

0871 

R region 

FT 

C : TE 

BEA3 

9654 

pot. polyA signal 

FT 

P-: iLYft 

SS71 

0671 

poiyA site 

XX 

SQ 

9 ;-fO|uenca 

3371 BP 

; 3314 AS 

1973 CS 2401 GJ 1983 TS 0 other. 


Initial Pcoro 
Residue Identity 
Gaps 


301. Optimized Score = 1194 Significance = 0.00 
53% Matches = 1410 Mismatches = S93 
332 Conservative Substitutions = 0 


X 10 20 30 40 50 

AA'-AGCA-GAAFA-CAfT—C-iGCAATGAGAGTGAAG—GAGAA—AT—AT C AGC—ACTTGTGG AG ATGG 

: ; ; ; : : ; :: : : : ; : : : ::::::: : : :::::: : : *• 

A A "• < -iAACTAAu : r-iCt CATCCGTC. TCCT AC ACC AGACAAGTGAGT ATGATG A ATC AGCTGCTT ATTGCCATTT 
>; SHO 6 i 20 C13C 6140 6150 6160 6170 




B3^'M'^Gftft^VrGQi: i t3CMCCft^'»£CTCC;TTG^aGA I'ftTTG- 


-ATG—ATCTGT AGTGCTACAGAAAAATTG 


TftT T AGCTAGTGC-TTG-CTTAGT AT ATTGCACCCAAT ATGT AACTGT-TTTCTATGGCGTACCCA 

' Hi QO SI SO S200 6210 6220 6230 


130 140 ISO 160 170 ISO 

■ iy_ ;v:(^TCAi ;AG7C7 AT i ' ATGGGGTACCTGT3TGGAAGGAAGCAACCACCACTCTATTTTGTG CAT-CAG 

. . , . . . i * t i titi i i i i » * * 1 1 1 ! ! ! 


CGTGGAAAAATC-.iCAACCATTCCCCTCTT7TGTGCAA--CCAGAAATAGGGA-T ACTTGGGGA ACCATAC AG 

g95,"> G2SO 62S0 6270 62S0 6290 6300 


iSO ■■•no 210 220 230 240 250 

-- .yfGCTArW-CAT-ATGATACAGAGGTACATAATGTTTG- -GGCCACACATGCCTGTGTACCCACAGACC 


ill i ii 


TGCTTGCC fGACAATGATGATTATCAGG- -AAATAA—CTTTGAATGTAACAGAGGCTTTTG—ATGCATGGAAT 
S31 o 6.320 6330 S340 6350 6360 6370 


260 270 260 290 300 310 320 

CC^V'C-CCACAAGAAGTAGTPfi TGGTAAATG—T6ACAGAAAATTTTAACATGTGGAAAAATGACATG—GTA 

* V ' , ‘ . . . , , , t i > i t iii i t i i ii i.i* .lit !!!!!! 


AATACAGTAACftGAACAAGCAATAGAAGATGTCTGGCATCTATTCGAGACAT-CAATAAAACCATGTGTC 

6380 6390 6400 6410 6420 6430 6440 


330 340 350 360 370 380 390 

yp^nftOPVT'GCATlaAGGATATA ATCAGTTTATGGGATCAAAGCCTAAAGC CATGTGTAAAATTAACCCCAC 

... . .... i ill* ti ill i i l i i i 


AAAC.TAACACCTTTATGT6T-AGCAATGAAATGCAGCAGCACAGAGAGCAGCACAGGGAA-CAACACAAC 

6450 S4S0 6470 6480 6490 6500 6510 


4Q0 410 420 430 440 450 460 

TCTETETTAETTTAAAGTGCACTGATTTGGEGAATGCTACTAATACCAATA-CTAG-TAA-TACCAATAG 


-CrCAAAGAGCAO'AA- GCACAACCACAACCACACCCAC—AGACC AGGAGC A AG AGAT A AGTGAGGAT AC 

SS2G 6530 6540 6550 6560 6570 


•WO 430 490 500 510 520 

YAGTAGCG Ci: ^.GAA ATGA—TGATGGAGAAAGGAGAGAT AAAAAACTGCTCTTTCAAT ATCAGCACAAGN 

... ... ; i l l l lit* III II III 1 ' ! 


TCCA fGCGCACGCGCAGACAACTGCT—CAGGATTGGGAGAGGAAGAAACGATC AATTGCCAbTTCAA T 

0580 65S0 - ' SGOO 6S10 6620 6630 6640 


540 550 560 570 580 590 

a : |6 :.m3AGS -SG TGCAGA - AAGAATATGCATTTTTTT AT AAACTTGAT AT AAT ACCAAT AGAT AATGAT A 

\ : * *V' .*1 j'l.i i ; . ■ ; ; ; ; ; ; ; ; ; ; ! ! ! 111! ! ! 


AGGAT'i 'AivAAAGAGA‘1 'AAGAAAAAACAGTATAAT—GAAACATGGTA-CTCAAAAGATGTGGTTT 

SbSO ‘ * ' 6660 6670 6680 6690 6700 6710 


BOO bJ,U &rZKJ OOW WWW - 

PTAf ICAGC "A ToCGTTGACAAGTTGTAACACCTCAGTCATTACACAGGCCTGTCCAAAGGTATCCTTTGAGC 

I,., i , ... .. '' * : ' : : : : ! : ; 


01 o 


20 


630 


640 


650 


660 


i i i i i titi. • * * > ; ; |! ! 

I I I . I «• , , . . ..I 


GTGAGACAA^l (■ :A VAGCftCAA-ATCAGACC“CAGT“GTTACATGAACCATTGC AACACATC AGT 

S720 6730 6740 6750 6760 6770 


670 690 700 710 720 730 740 

HAATTCCCATftCATTATTGTECCCUGGCTGGTTTTGCGATTCTAAAATGT aataata-agacgttcaatgga 

. , t ii. i.i iii ii ii It' • ! ! ! ! 


, I t I III III 1 - - 

; ; i ; i . . . ....... . ■ ■ > 

Qcy -TCACAC A--ATCA-TGTG ACAAG—CACTATTGGGATGCTATAAGGTTTAGATACTGTGCACCACCGGG 
6780 6730 6800 6810 6820 6830 


7f30 7S0 770 780 790 800 

ACAI.vf-.iACC -AT:'-; TP.C& -AATGTCAGCACAGTA—CA- ATGTACACA—TGGAATT AGGCCAGT AGT A 

. . . . , . . ... ii i i i i i i i» iii i i i i t 


TTATGCCG f ATT.ArV.aATG I AAVGA'! ACCA—ATTATTCAGGCTTTGCACCCAACTGTTCTAAAG TAGTAGCT 
6840 6850 £860 S870 6Q80 6890 6900 




o -?yQ 













jfTCriPiC—TC-CI FG—T GGUAGTC’t AGCAGAAGAAGAGGT AGTAATTAGATCTGCCAATTTCAC 

... . . .... iii iiii i i i i i i 

iii i 


7 It I 


i » : itt it it 

1 J ! tit til t 1 1 I 


TCTACftT B TftCC .A6.GA' f GATGG?- iftACGCAAACTTCCACATG-GTTTGGCTTTA—A—TGGCACTAG AGC 

691.0 6920 6930 6940 6950 6960 6970 


870 y;?0 1-390 900 910 920 930 

AGACAATGCTAAftACCft —Tr^VTAGTACAGCTGAACCAATCTGT AGAAATT AATTGT ACAAGACCCAACAAC 

1 ” ..... .... , . i i i i i i ii tilt iiii 


AGft9ft:Tl7¥9ftAi^VfATftTCTftTTGGCATRGCAGftGATAA-TAGAACT—ATCAT-CA-GCTTAAACA— 

E930 S9r;0 7000 7010 7020 7030 


940 SiOO SBC 970 990 990 1000 

APff Ai JAAGl’ ‘.ft—ftftftiGTATCCGTATCCAGftGGGGACCAGGGA—GAGCATTTGTT AC-AAT AGGAAAAAT— 
I ’ ; j 1 ! ! IIII!! I I I I I I I I I I I I II * > * * • * • * 11 

AATATTA1 ftfYTt 0 "GAG' fTTGCATTGT AAGAGG-CCAGGGAAT A AGAC AGTG AAAC A A AT AATGCTT ATGT 

7040 7050 7090 7070 7080 7090 7100 


10!O j. 020 1.030 1040 1050 1060 1070 

-c--i ■; -ftft ft-|'{VV GftGp CJft'TGC AG ATTG' fAAHATTAGT AGAGCAAAATGCAATGCCACTTT A AAAC A-GAT A 

’;Vi : : ! ; : : ; : :: : ; : : : :::::: : : : : : : : : ’• : : 

ftftfrftiACATb^GT itcactcccpct—pc—cagccgatcaata—aaagacccagacaagcatggtgctg 
7? 1.0 7 5 20 7130 7140 7150 7160 


i.OftO 1090 1100 1110 1120 1130 

q _r'fft—i-.0/?/7<ft-fTftft6At : .lft—AGftAT—TTGGAAAT—AATAAAACAATAATCTTTAAGCAATCCTCAGG 

; : ‘ : \\ : V: ; ; ::: : :: :::: : : :: :::::::: 

GTTftftftAftnTj? Vft TT uBftftftGftiJG'Jl.-ftl u-iO'ftGGiftbiGTGAAlabiAAACCCT TGCAAAACATCC - CAGGTAT 

7170 * 7190 7190 7200 7210 7220 7230 


1540 1150 1160 1170 

ALr9GlGACCCP.G/-'iAA*!''tT5TAftf;GCACAGTTTTAATTGTGGAG- 


1 180 

-GGGAATTTTTC-T AC- 


1 190 
—TGTA— 


i iii i 


AtrV-V" 'C- ftfty- -i -.j AC At 1 ft AG rftftTATTftGCTTTGCAGCGCCAGGAAAAGGCTGAGACCCAGAAGTAGC 

•- 05 O.7260 7270 7280 7290 7300 


7240 


1200 1210 1220 1230 1240 1250 

PiT*j" r/> -*CftO--AAiT".7T1 f APT I 'ft iST ACTTGGTTTAAT AGT ACTTGG AGT ACTGAAGGGTCAAAT - AACAC 

; : T: i : ; !::; : :: ; :: ; :; : : : : : : : : : : 

ATACATSTGGACTftr ‘.CTiaCA* aft.GG AG—fiGTTTCTCT ACTGCAACAT—6ACT—TGGTTCCTCAATTGGAT AG 
7310 732.3 7330 7340 7350 


7360 


7370 


1260 1 770 1230 1290 1300 1310 1320 

TGAAt-„GAAr-TGACA - C Aft *'C ACACTCCC ATGCAGAAT AAAACAATTT AT AAACATGTGGCAGGAAGT AGG 

. , . , . i .1 i:i .iiii ii* * * • »t»i i i i i i i 1 ' ! ! ! 

i ; ; II, III ;.i. ; i it it* i ■ i » i ii* i > - tit* i i . . » » .it.. 

fl5pL\ft._7ftfti7ftCftoftGQ7C AftTfATGC—ACCGTGCCAT ATAAAGCAAAT AATTAACACATGGCATAAG6T ACS3 
7390 7390 74C0 7410 7420 7430 7440 


1330 5340 1350 1360 1370 1380 1390 

AAAAGCAATG—fATGCCCC rCCCATCAGCGGACAAATTAGATGTTCATCAAATATTACAG—GGCTGC-TA 

. . . . . . . , .... iiii t i i i i 


—GAGA AA i'llTO) fATl TSCCTCCCA - GGGA—A6GGGAGCTGTCCTGCAACT—CAACAGTAACCAGCATA 

7450 74S0 7470 7480 7490 7500 


5,400 14 5.0 1420 1430 1440 1450 

TTftftCA AGAGft’i 'GG—TRGTAATftACAftCAATGGGTCCGAGAT-CTTCAGACCTGGAGGAGG—AGA—TA 


f i it 


ftT-, RiTTA-ftCft-'T PACTGGCAA-AACAAT AATCAGACAAACATTACCTTTAGTGC-AGAGGTGGCAGAACTA 
V5io 75.40 7530 7540 7550 7560 7570 


1490 1470 1480 1490 1500 1510 1520 

T 9ft ’: ,GGACft -41 A33AftAARTGAATT A1 AT A A AT ATAA AGTAGT A A AA ATTGAACC ATT AGGAGT AGCACCC 

: ;;; ; ;: :::::::: :::::::: ::::::: :::::: 

Y-ACAirftTTGGAGTTGGGftGAT—TAT AAftT - T6GTAGAAATAACACCAATTGGCTTCGCACCT 

7,550 7590 7600 7610 7620 7630 



PiC-t \H&C('if¥-Pt iPA- -AG AGTGGTGCAGAGAGAAAAAAGAGCAGTGG—GAAT AGGAGCTTT6TTCCTTGGQT 
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PlCV ! i^nAGft/-'^.''.n&A1'ACTCCTC;Tlj;CTCACGB5AeACATACPiAGAGQTGTGTTCGTGCTAGGGTTCTTGGGT 
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::::::::: : : : : : : : i : : s : -. . : : 

TT^f O^GCAP.AAGCAGGT i CTnCAATbGGCGCGGCGTCCCTGACCGTGTCGGCTCAGTCCCGGACTTTACTG 

77,0 ' ’ rv20 7730 7740 7750 7760 7770 
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TCTfiGTATfiGTtGABCAGCAGAACtAATTTGCTGiAGGGCTATTGAGGCGCAACAGiCATCTGTTGCAACTCACA 

, ; ; ; ; ! ! 1 ! ! ! . . . ..... . ....... ... • * 

l5 CCC ; ReAT06Tu6A,6CAACA6CAAnA9CTGTT6GAC6TGGTCAAGAGACAACAAGAACTGTTGCGACTGACC 
7780 776!) ~ 7800 7810 7820 7830 7840 7850 
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GTCTGGGGnpTCAAGCAGCTGCAGGCAAGAATCCTGGCTGTGGAAAGATACCTAAAGGATCAACAGCTCCTG 

.. . . . . , , , , , <i i .tilt. .... ii i . * * 


. .... .. . . • * 


GTC'i GG6GAA; T..APAAACni CCAGGCAAGAGTCACTGCTATAGAGAAGTACCTACAGGACCAGGCGCGGCT— 
786.; 7370 7880 7890 7900 7910 7920 

\ f.20 18-30 1340 .1850 I860 1870 • 1880 

GGGATT— I 6GGGTTGCTCTGGAAAACTCATTTGCACCACTGCTGTGCCTTGGAATGCTagttggagtaata 


-AAATTCA TGGRGATETGCGTTT AGACAAGTCTGCCACACT ACTGT ACCATGG- 
7930 7940 7950 7960 7970 
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AAT5TCTGGAAAAGATttggaataacatgacctggatggagtgggacagagaaattaacaattacacaagct 
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ATTOGTTAGCfiCGTG'iGTGGGAGA.'-'TATGACGTGGCAGGAATGGGAAAAACAAGT CCGCTACCTGGAGGC 
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-mai ru-G fTAA'TT-GAAGAATCGCAAAACCAGCAAGAAAAGAATGAACAAGAATTATTGGAATTAGA 

V:V; : V ; ' ; :::: ::::::::::::: :::::: :::::: 

AAATATCAbTV-,r ;: 1 AAGTTTAGAACAGGCACAAATTCAGCAAGAGAAAAAT ATGT ATGAACT ACAAAAATT AAA 
8060 3070 8050 8090 8100 8110 8120 
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lOi if V fGGG! ITi AGTTTGT GG AATTGGTTT AAC ATAAC A A ATTGGCTGTGGT AT A-T AAAAAT—ATTC AT AAT 

.. , . , iiit ii. i ...it » . * * * 1 * ' 11 

i ; ; ; ; i ; ; ; ; ! .. > •• • . • • • 11 1 1 ■ 1 1 

-f , tpggATh'. T fTi GGC A ATTGGTTTGACTTA ACCTCCTGGGTCAAGT AT ATT C AAT ATGG AGTGCTT AT 
8130 8140 8150 8160 8170 8180 8190 

2100 211.0 2120 2130 2140 2150 2160 

GATAGTAGivAO-.iCiCTTSGTAS&TTTAAGAATASTTTTTGCTGTACTTTCT AT AGTGAAT AGAGTT AGGCAGGG 

: :: ; : : : • : 5 : : 

w - <•>! 3TAGr:A‘v-'| AATAGCTT 7 AAGAATAGTGATATATGTAGTACAAATGTT AAGT AGGCTT AGAAAGGG 

fl 8 O 0 8-2 to 8220 8230 8240 8250 8260 
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ATATTCAC5—ATT ATCGTTTC- 


2190 2200 2210 2220 
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CTA'T AGGCi’Ti i"!"TTT5TC T TC.CCCCCCCGGTT AT ATCCAACAGATCCAT ATCCACA—AGGACCGGGGACAGC 
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• —Gen O >£' ■ A A'l AG A AG AAGAAGG fGGAGAGAGAGACAGAGACAGAT CCATTCGATT AGTGAACGGAT 

?; . . ,'Y •" -.i - : : ; ; ; : : : : : : :: 

-if .-.rvi^pr^^.p^.qQ PiCG ETGiGPtf^CiCAACGGTGGPtGRCAGATACTGGCCCTGGCCGATAGCAT 

'-^^O l ’ # "" 1 * :J;^30 .8360 8370 8380 8390 8400 
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, , , ' ] , | til II I lit! til I I 1 t I I II ‘ * I I I I 

fTftC.VTATDY-.—HGALOTTCCfG.ACCCTCCAACTCATCT-AC—CAGAATCTCAGAGACTGGCTGAGACTT 

£40;.! 84C-;U 3500 8510 8520 8530 

P430 2440 2450 X 

GGTGGAATC—TCCTACAGTATTCiiGAETCA—EGA ACT A—AAG 


AGA.'iOAGCOT 
854:; 3 


iTA VGGCuGCGASTGGATCCAAGAAG 
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107S 


put. snv gene 

(1 is 2nd base in codon) 

in frame stop codon 


Secuier 


1 1.42 BP; 


522 A! 208 C; 273 G5 231 T* 108 other; 


Init Iai 
Res :L due 
Gaps 


25?. Optimized Score = 565 Significance = 0.00 

517.; Matches - 622 Mismatches = 488 

33 Conservative Substitutions = 0 


■'T TT ’'" i T: 


130 1 190 1200 

-ttt I’Tar rnTMTTr oorflr i 


1210 


1220 



:H6AQGfta^GTTCCTCTACTGTAA-AfV-TGAATTGGTTTCTA—AATTGGGTAGAGGA 

7 1.0 20 30 40 50 SO 


ik:3<> 1.740 1250 1260 1270 1280 1290 

•; ('-'iCT.! i AC1 biAAG.ii->ib:TGPiAAT AACftCTGAAGGAPiGTGACACAATCACACTCCCAT6CAGAATAAA 

. . . . . iii • i i i > * t iii 


rAGGGATG fAAC rACCCAGAtsSCCAAASGA-AC-GGC AT AGAA6SA ATTAC—GTGCC6TGTCAT ATTAG 

TO ■ lO nn 100 110 120 130 


1300 1.310 1320 1330 1340 1350 1360 

ACAfYI" fTA'i'AAACATGT GGC; 'iGGRAGTAGGAAAAGCAATG—T ATGCCCCTCCCATCAGCGGACAAATT AGA 


t i i : i r i iii till ill 

t i t t i iii iiit iii 


i i 

i i 


ACAAATAA 7CAACAC TTGGCAT AAAGT AGGCAAA—AATGTTTATTTGCCTCCAAGAGAGGGAGACCTCACG 
?.40 150 3 60 170 ISO 190 200 


1370 13,.;0 1320 1400 1410 1420 1430 

!G7TCATCRAA7 ATTAC.AGGGCTGGTATTAACAAGAGATGGTGGTAATAACAACAATGGGTCCGAGATCTTC 


t iiit 


rGTAACTCCACAGTGACCAGTCrCATAGCAAACATAGATTGGACTGATGGAAACCA-AACTAATATCACC 

310 220 230 240 250 260 270 


1440 1470 1>60 1470 1480 1490 1500 

AGACUTG- -GAGHAiJ?--AGA Tft T'GAGGG ACAATTGGAGAAGTGAATTATATAAATAT AAAGT AGT AAAAATT 


! t ; [ , t I I t I I t I I I* 1 I I I I I I I I I 1 I 

, ; ; ! , , t i i i t t t t ti i i i t i i i i i i i i t i it 


ATG;- iGTGCAGASGTGGCAGAACT- - -GTA7CGATTGSAGTTGGGAGAT—TATAAAT-TAGT AGAGATN 

2SO 230 300 310 320 330 


1510 


i 57 n 


x 530 


3 540 
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GAftnCATTAGGAGTAGCACCCACGAAGGCAAAGAG-AAGAGTGGT—GCA—GAGAGAAAAAAGAGCAGT 


i i i : i 


NiVGN'NHNNWNGiVNiOiNNOCCCCACAGATGTGAAGAGGTACACTACTGGTGGCACCTCAAGAAATAAAAG—AG— 

340 350 360 370 380 390 400 


5530 1530 1600 1610 1620 1630 1640 

GG GAATAGGAGCTTTGTTCCTTGGGTTGT—' TGGGAGCAGCAGGAAGCACTATGGGCGCACGGTCAATGACGC 

. i ■ t t i ■ i i i ill 


t t i 


i i i i i i 


GCrin r/'rTT .-;TG:CYAG ! .'iSTTCTTGGGTT'l TCTCGCAACGGCAGGTTCTGCAATGGGCGCGGCGTCNNNNNNNN 
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1650 3660 3670 1680 1690 1700 1710 

iTAACGGTACAGL-CCAGAOAATTATTETCTGGTATAGTGCAGCAGCAGAACAATTTGCTGAGGGCTATTGAGG 

. . . . . . . . . .. . . i . i ■ i i i i i ii 
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! I t I I I 1 1 I t 1 I I 1 1 I 1 I 
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NFT-'OGGCTCAGTGCCGGACTTTATTGGCTGGGATAGTGCAGCAACAGCAACAGCTGTTGGACGTGGTCAAGA 
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. . . . , ...... I II t 1 II 1 II 1 
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RAOAACAAGAATTGTTRCGACTSACCGTCTGGRGAACAAAGAACCTCCAGACTAGGGTCACTGCCATCGAGA 
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GATT GCT A AAGO 1 AT C.A AC AG? 5TCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTGCTGTGC 
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AGTACTTAAAGGACCAGGCGGAGCTGA' 
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CA p '.iGCCAAAT':-'; 'A ACT-CTA ACACCAGACTGGAACA-AT-GA-T A- 
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AGCGAAAGGTTGACTTCTTGGAGGnAAATATAACA—GCCCTCCTAGAAGAGGCACAAATTCAACAAGAGAAG 
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! ,'Ti '■-TATft f ftO -4 »HTi-VFTCft I'ftftTGATftGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTGCTGTACTTTC 

i i i i i 
i i i ■ i 
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0 . 00 

0 

4 

KYVPyp'5 

Human 

immunodeficiency 
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r' i. nri«- 1 inal LAV, sometimes called LAV-1 to distinguish it from 
!-:i\*7 r lav- 2) v is now referred to as HIV-lbru. An infectious clone 
■ i rb.-■ vi r us has been constructed by Keith Peden. Molecular Bio¬ 
logy and Genetics* Johns Hopkins University School of Medicine. 
Baltrmcrs, MD 21205 (301) 955-3B52. HIVNL43 is also an infectious 
duns having for its 3’ hal f a clone of the BRU isolate. 

Aogi/.ired immune deficiency syndrome (AIDS) is caused by a 
retrovirus known by several different names, probably representing 
two separate strains: human T—cal 1 lymphotrapic virus-III 
(f!7L.V~ 1 11) land iympf iadenopathy-associated virus (LAV) are thought 
to bo ore strain, and AIDS-associated retrovirus type 2 (ARV-2) the 
other. All three viruses, whose sequences do not differ by more 
tier, about b:’b ere bel iaved to belong to the retroviral subfamily 
Lo'. :t iv.'i.ridao- or "slow" viruses. 

For the details of the annotation and for other pertinent 
refurericss, see the HIV reference entry, 
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: 183 5. 4578 poi polyprotein (NH2-terminus uncertain; AA at 

1631 ) 

4323 5205. vif protein 

514 1 54351 vpr protein 

5412 5S2E tat protein, exon 2 (first expressed exon) 
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signal 9705 9210 mRf 

BASE COUNT 3.2S9 a 1656 c 2232 g 2052 t 

ORIGIN Cap sits o-f genomic RNA» 


1749 atttcttcagagcagaccagagccaacagccccaccag in C2] 

ag in Cl] 

9210 mRMA pa 1yadeny1ation signal 


Initial Score 
Residue Identity 
Gaps 


2341 Optimized Score = 2456 Significance = 0.00 

99% Matches = 2456 Mismatches = 3 

1 Conservative Substitutions - 0 
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AAGAGCAGAAGACAGTGGCAATGAGAGTGAAGGAGAAATATCAGCACTTGTGGAGATGGGGGTGGAAATGGG 
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AAGAGCAGAASACMGTGGCAATGAGAGTGAAGGAGAAATATCAGCACTTGTGGAGATGGGGGTGGAAATGGG 
X 5790 5S00 5S10 5820 5830 5840 5850 
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GCACCATGCTCCTTGGGATATTGATGATCTGTAGTGCTACAGAAAAATTGTGGGTCACAGTCTATTATGGGG 
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GCTGTTTCAATf'-.TCAGCACAAGCA T AAGAGGTAAGGTGCAGAAAGAAT ATGCATTTTTTT AT AAACTTGAT A 
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GT r ’.' ,/ >PPi!?iT''H 11TTP rGARnnAA iTCCCATAQATTATTGT6CCCCGSCTGGTTTTQCGATTCTAAAATQTA 
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6TT03AGTAAVAAAiOTCTGGAACAGATTTGGAA 1 AACATGACCTGGATGGAGTGGGACAGAGAAATTAACA 
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ATT." 'O ACf 7 ti• ..C i TAA'i ACATTCCTTAATTGAAGAATCGCAAAACCAGCAAGAAAAGAATGAACAAGAATTAT 
7730 "••740 775V: 7760 7770 7780 7790 


2020 2030 7040 2050 2080 2070 2080 

TGGAOTT Al-ir' ! AAMTG'liGCAAGTT I lYliVGAi-.T TOAVT TT AAGATAACAAATTGGCTGTGGTATATAAAAATAT 
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TGGA ATTAf • .ATT. AAT6GGC AAG TTT tiTGG AAT t*GGT TTAACATAACAAATTGGCTGTGGTATATAAAAATAT 
300 ‘4310 7820 7330 7840 7850 7860 7870 

390 J.10^ 2110 2120 2130 2140 2150 2160 

TC A1' A-ATFiA'l' Ai? • TftGRAGGC 1 TSET AGGTTT AAGAATAGTTTTTGCTGT ACTTTCT ATAGTGAAT AGAGTT A 
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TCA'i 'AA'i ir;f vr.--,6 t'ARO ASGCTTGC-TASGT TT AAGAAT AGTTTTTGCTGT ACTTTCT AT AGTGAAT AGAGTT A 
7830 7830 7500 7910 7920 7930 7940 


-?! 70 'ISO 25 90 2200 2210 2220 2230 

I3GCAGGGA' I CACC.47TAYCGT : TCAGACCC :ACCTCCCAACCCCGAGGGGACCCGACAGGCCCGAAGGAA 
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rSGCA: iiSA’ rA fTCACCATTATCGT 'i '7CAGacccacctcccaaccccgaggggacccgacaggcccgaaggaa 

795S 75' :: .0 ' .‘’370 7380 7990 8000 8010 


'-.ifi 2250 '260 2270 2280 2230 2300 

A,--,Ar :rc, •* .nraoAAArAGftTrnaTTrBATTA^TRAAa^Trfn-TAr^Ar^i^Trj- 
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IP, ■> v. , r , v i ■ ,th '->,1 iitfr.AC-IAC AGAGACMEATCCATTCGIATTAGTGAACGGATCCTTAGCACTTATCT 

‘ " ~;0 A .-0 Q050 8060 8070 8080 


00..9 

4320 


2370 


pv, -j q 4320 2330 2340 2330 2360 

GGGOOGA \ X - 034 IGAuCCTT ’37GCCTCTTCAGCTACCACCGCTT GAGAGACTTACTCTTGATTGTAACGAGG 


RG r7 ’ /- 'Cni r> "! > ’“ ; ‘I V ,r Ai 7 1 i' ' 1 r "* i' - fCT l CAGCTACGACCGCT TGAGAGACTTACTCTTGATTGTAACGAGG 

* 8120 8130 8140 8150 


8090 

8380 


o too 

390 


; 11 o 

2400 


2410 


2420 


2430 


2440 


A*^T^Tr* : L :/ - ' V ~T ' GTG 1 • RAGiV : nR0iL?GGG:7GGGAAGCCCTCAAATATTGGTGGAATCTCCTACAGTATTGGAGT 
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’ : ' 7.'7; C-^7;Hp007:Rrii;TiiEs0sCCCTCflPif=iT ATTGGTGGAATCTCCT ACAGTATTGGAGT 


St 60 


70 


I 80 


SI. SO 


8200 


8210 


8220 


2460 >•: 

casnaac i aaau 
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CAGGAACT m A AG 
8230 8240 
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HIVNitCG 

LOCUS 

DEFINITION 

ACCESSION 

KEYWORDS 

SOURCE 

ORGANISE 


REFERENCE 

AUTHORS 

TITLE 

JOURNAL 

STANDARD 

COMMENT 


f.Tvr 664 

Human 1 mmunodef i c i ency virus type 1» isolate MN, c 

H''..-'iviiMCR SY33 bp ss-RNA VRL 15—JUN—1989 

Human immunodeficiency virus type l. isolate MN. complete genome. 

7,'.744S 

Human rmmunorieficiency virus type 1 (HIV-1), isolate MN, proviral 

UNA. 

Human immunucie f).ciency virus type 1 

V ii c'aa > ss-RNA enveloped viruses; Retrovir idee 5 

Leri7 i. v x rnas-. 

I (bdo83 1 to 8738) 

Gi i\ 'cio ■> C h Guo v i-!, —G„ «; FrB.nchini »G. » A1 dovini»A. » Co 11 a 11 i »E. » 
Ferre?, 1 ,K. » Vtong-Staal .F, , Gallo.R. C. and Reitz,M. S. Jr. 

■r.nveio;-,- sequences of two new United States HIV-1 isolates 

VI 1 tj1oc:V 1G4*; 531—536 ( 1988) 
full ot^ff rev'law 
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OjwCJ 

86c ‘V 


5076 

614 4 

OSp i\ 

6239 

5809 

pepf 

, >0 o 

vJO 

9359 

p*, -’ui'caC; 

43 4 

8633 

pTO - 

434 

S855 

I V 5 

V40 


IV- 

600:0 

3395 

tv :. 

(-OK* > 

8'<" y > 


. kJndlv provided in computer readable form by Marv Reitz, 
Sethesdf, MD. 20892 U. S. A. 

sol. ate was taken from a pediatric AIDS patient in 1984. 
coding sequence shows an in-frame stop codon at position 


description 
gag polyprotein 

pol. polyprotein (NH2-terminus uncertain; AA at 
2091; in-frame stop codon at 3783) 


vif protein 
vpr protein 
tat protein, 
tat protein, 
rev protein, 
rev prate i n, 


exon 

exon 

exon 

exon 


2 

3 

2 

3 


(first expressed exon) 
(AA at 8397) 

(first expressed exon) 
(AA at 8398) 

vpu protein (premature termination) 
envelope polyprotein 

rief protein (premature termination at 9357 
relative to other HIV—1 sequences) 
genomic mRNA 

tat, rev, nef subgenomic mRNA 
tat, rev, nef subgenomic mRNA intron 1 
tat. cds intron 
T-C.V/ rHq _urj±JCCia. 
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iv.*; 

SOSO 

,">*7.0 >iT 

tats rev « nef subgenomic mRNA 

LTR 

y 

S33 

5’ LTR 


LTr: 

□ 1 OS 

3730 

3’ LTR 


rp‘» 

45a 

550 

R repeat 5 5 copy 


rp'! 

N5S0 

E533 

R repeat 3’ copy 


b 1 Tn j .1 ngj 

374; 

3 0,0 

Sp1 binding site 

III 

b ?. v ib i ncs 

7C‘5' 

■_>0 : 

368 

Sp1 binding site 

II 

b :l i id ;i ng 

338 

407 

Sp1 binding site 

I 

b i 'fib i ng 

635 

653 

primer (Lys-tRNA ) 

binding site 

site 

3783 

371? 5 

pro. cds in-frame 

stop codon 

signal 

3631 

8834 

mRNA polyadenylation signal 

BASE COUNT 

34S3 a 

17'89 c 

2344 g 2162 t 
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Loft end of vi ral cie'norr.e 


Initial Score 
Residue Ident 11 
Gaps 


Bi-.-S upx i Hi i zee' Score — 2223 S ign i f i cance — 0. 00 

30% Matches « 2253 Mismatches = 170 

58 Conservative Substitutions = 0 


x 10 20 30 40 50 60 70 

AAG.-GCAEAPlGACASTSECAATEAGAETGAAGGAGAAATATCAGCACTTGTGGAGATGGGGGTGGAAATGGG 

i , i t i i i r i i i » i i t i i i i i * i i i i i * * 1 * * 1 * * * 1 11 * 11 ! ! ! ! ! ! ! ! ! ! 
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fy-ypppC^rpnOtCAGTGGCfY-'frGAGASTGAAGG-GGATCAGGAGGAATTAT-CAG-CACTGGTGGGGATGGG 
6220 6230 6240 6250 6260 6270 6280 

80 BO <00 110 120 130 140 

GCACCATGt IT CCTTGCL1AT ATTGATGATCTGT AGTGCT ACAGAAAAATTGTGGGTCACAGTCT ATT ATGGGG 

; ; ;: ; ; ; ; :i ::::::: i :: : 

GCPiCG.-.TGCT'JG TTGC G fTAT f A.'-Y'. GAT CTGTAGTGCTACAGAAAAATTGTGGGTCACAGTCTATTATGGGG 
6290 63u0 6310 6320 6330 6340 6350 

150 160 170 180 190 200 210 

TACGTGTG FGGAAGG?AGCAACCACCACTCTATTTTGTGCATCAGAJGCTAAAGCATATGATACAGAGGTAC 

i : i I j i i I ; > i i ;• j; ■ 5 ;;;• ji i: i : 

TACt.’TGYG i GGAAAGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACAGAGGTAC 

6360 6 370 3380 6390 6400 6410 6420 6430 

220 230 240 250 260 270 280 

ATAATGTTTGGGCCACACATGCCTGTSTACCCACAGACCCCAACCCACAAGAAGTAGTATTGGTAAATGTGA 

; j ; : i I ; 1 ! I ; ! I ! ! ! ; i ! I ! , ;. ........ .* < * < *.*.. 

ATAA 7GT7 rGGGCCAC.ACAAGCCTGTGTACCCACAGACCCCAACCCACAAGAAGTAGAATTGGTAAATGTGA 
S440 6450 6460 6470 6480 6490 6500 

290 300 310 320 330 340 350 360 

CAGAAAAT fTAACATGTGGAAA.-.ATGACATGGTAGAACAGATGCATGAGGATATAATCAGTTTATGGGATC 

. . . ff ifiTi.ji .I.’,.,.?., i , , , , , , , , , , i , i r . . . . . . i i t * i t i * » • > I I I I l . I . i i i t > * 
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CAGAAAAT' fT'TAACA’ l GTGGAAAAATAACATGGTAGAACAGATGCATGAGGATATAATCAGTTTATGGGATC 
6510 6520 6530 6540 6550 6560 6570 


370 330 390 400 410 420 430 

AftftGGCTA.A.AGCCATG'TGTAAAAT f AACCCCACTCTGTGTTAGTTTAAAGTGCACTGATTTGGGGAATGCTA 


... i. ill. i . i t » i i i i i i 


ill i t i . . ill 


AAA.Gi TTAAAG: Oft'TT fdTAAAAO i AA.CCCCACTCTGTGTTACTTTAAATTGCACTGATTTGAGGAAT ACT A 
6580 6590 6900 6610 6620 6630 6640 


440 450 460 470 480 490 

CTAATACCAATACTAG-TAATACCAATAGTAGTAGCGGGGAAATGATGATGGAGAAAGGAGAGATAA 

. - ii> t t ii ii i it . . 1 * * • ' 11 * 


6650 

GGi- -:0 

6G70 

6680 

6690 

6700 

6710 


500 

:4 8; 

520 

530 

540 

550 

560 

570 
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J t *3 VT i'Cka ; Y3~CACCAUAAG3 ATAAGAuATA AG A l GiCAu A AAbAATATGCACTTCTTTATAAAC 
6720 .9730 6740 67150 6760 6770 6780 



, ;B0 BOO 610 620 G30 640 

TTC-'^' r ATA:V, rATnAA'ATTATAATGATACTACCAECTATACGTTGACAAGTTGTAACACCTCAGTCATTACAC 


l l i l i i l t i i l l i 


TTixf'.'fATAGTAl CAAfi A.&JYI Y-'.ATGATA.GTACCAGCTATAGGTT6ATAAGTTGTAATACCTCAGTCATTACAC 
6790 6800 6050 GS20 6830 G840 6850 6860 

Rf.(; 680 S70 680 690 700 710 

A[5G.:,t7rGK;iJRf^69TRTCCT7T6i^GCCttATTCCCPiTACATTATTGTGCCCCGGCTGGTTTTGCGATTCTAA 
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rtP'li'; ,'Tb li '( nVCCYTTf.’r ;GCf ^ATTCCCftTACACTATTGTCaCCCCGGCTGGTTTTGCGATTCTAA 

63'’o' ::3S0 6060 6900 6910 6920 6930 

720 730 ’740 750 760 770 780 

AATCi rAAYfiAT£‘AGft!_ ; iiT7 CAATGGAAGAGGAOG'ATGT ACAAATGTCAGCACAGT AC-AATGT ACACATGGAA 


t i i i i i i i 
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AATL!^fAACGl ;7,W;AA4:::T7Cr,GTC.'GAAAAGGA rCATGTAAAAATGTCAGCACAGTACAATGTACACATGGAA 
6940 8950 GS60 6970 6980 6990 7000 


790 600 810 820 830 840 850 

TTAGGOCAl: :TAC5f ATCAA.CTC AAC f GCTG7TG.AATGGCAGTCT AGCAGAAGAAGAGGT AGT AATT AGATCTG 


TTAGGl’CA! .;TAO; :‘A T 7:AAC7CAA0‘TGCTG' 
77! i. 0 7020 70.30 
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TAAATGGCAGTCT AGCAGAAGAAGAGGT AGT AATT AGATCTG 
7040 7050 7060 7070 


860 870 880 890 900 910 920 930 

CCAfYT'TCACAGACAATGCTAAAAOCATAATAGTACAGCTGAACCAATCTGTAGAAATTAATTGTACAAGAC 
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A1740TTTC:AC790TA ,v T&CTAAAACCATCATAGTACATCTGAATGAATCTGTACAAATTAATTGTACAAGAC 
70S O -,'OSO 7100 7110 7120 7130 7140 


940 950 960 970 980 990 1000 

CCAAG.AACAATACAAGAAAAAGTATCCGTATCCAGAGGGGACCAGGGAGAGCATTTGTTACAATAGGAAA— 


til i i 


CCAACTACAATAAAAGAAAAAGGAYACATAT—AS- 

7150 75.80 7170 7180 


-GACCAGGGAGAGCATTTTATACAACAAAAAATA 
7190 7200 721O 


it, 5 o 1020 1030 1040 1050 1060 1070 

-AATAGGAAATA f GAl ,'ACAA! SCAC-ATTGTAAC ATT AGT AGAGCAAAATGC AATGCCACTTT AAAACAGATAG 


i i i i i i i i i i i i i i iiii t t i i i l i i l 
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TA ATAGGA^C 7 ' AT AvAG ACA AGC ACATTG V A AC ATTAGT AGAGCA AA ATGG AATGACACTTT AAGAC AGATAG 
7920 7230 7240 7250 7260 7270 7280 


5.080 1090 llOO 1110 1120 1130 1140 

CTAr f^AA' rTAAGAGOACAATTTuGAAATAATAAAACAATAATCTTTAAGCAATCCTCAGGAGGGGACCCAG 

; :’Y; ; ; ; :; 

T TAu CA'.AA fTAAAAGAACAATTT AAGAATAAAACAATAGTCTTTAATCAATCCTCAGGAGGGGACCCAG 

7290 7300 7410 7320 7330 7340 7350 

1i 50 1590 1570 1130 1190 1200 1210 

AAA i I 67 A ACGGACA8' fTTTA AT TGTGGAGSGGAATTTTTCT ACTGT AATTCAACACAACTGTTT AAT AGT A 
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AAA' rS TA/Ti '-•CACr iT'f 'rT'i'AA"! 'T-iTEGAbiGGBAATTTTTCTACTGTAATACATCACCACTGTTTAATAGTA 
7360 ',570 7300 7390 7400 7410 7420 


1220 1-TO 1240 1250 1260 1270 1280 

CTTGG-fT VAATAGTftCITSG— —AGTACTGAAGGGTCAAATAACACTGAAGGAAGTGACACAATCACAC 


CTTCGAAT'GGTAATAATAnTTGSAATAATACTACAGGGTCAAATAACAAT- 
7430 7440 7450 7460 7470 


—ATCACAC 
7480 


1 29 c, 1 5 QO 1310 1320 1330 1340 1350 

rCCOATGCAGAArAAAACAATTTATAAACATGTGGCAGGAAGTAGGAAAAGCAATGTATGCCCCTCCCATCA 
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TTC'’! •iT7.TA:-i' : ’‘ -taaaacAAATT -'.TAAACATG- I GGCAGGAAGTAGGAAAAGCAATGTATGCCCCTCCCATTG 
?v.^o" 7508 75 5 O 7520 7530 7540 7550 







1360 1370 1 380 1330 1400 1410 1420 

GCGGACP. A.'T<TAGATGTiTCATCAft AT ATTACAGGGCTGCTATT AACAAGAGATGGTGGT AATAACA-AC- 
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AAGCMtJAAA' iTAGATG; i fCATCAP. AT AT' fACAGGGCTACTATTAACAAGAGATGGTGGTAAGGACACGGACA 

7530 7570 7580 7530 7600 7610 7620 

1.430 1440 1.450 1460 1470 1480 1490 

—00 YGGG fCCGAGA I 'rnTCAGACCTGGASGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAAAT 
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CGAACGAC:ACCLAGATCTTCASftCCTGGAGGAl3GAGATATGAGG6ACAATTG6AGAAGTGAATTATATAAAT 
7630 7640 7650 7660 7670 7680 7690 7700 

1500 1.510 1520 1530 1540 1550 1560 

ATAM.AGT ACT A AAA AT TGAACCATTAGGAGT Af?;CACCCftCCAAGGCAAAGAGAAGAGTGGTGCAGAGAGAAA 


ATAAAGTAfTTAAC J AAT TGSAACG AT fAGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTGGTGCAGAGAGAAA 
7?H> <72.0 7730 7740 7750 7760 7770 

1570 1580 1530 1600 1610 1620 1630 

AAAGAGCAGTfGTYiAATAGGiAGCT'i TGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCACGGT 
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AAATAGCAGCG ATAGGAGCTCTGTTCGTTGGGTTCTT AGGAGCAGCAG6AAGCACT ATGGGCGCAGCGT 

7780 77SO 7S00 7810 7820 7830 7840 

1640 1550 1660 1.570 1630 1690 1700 

CAATWACEC PGACGGT ACASGCGAl-iACAATTATTGTCTGGTATAGTGCAGCAGCAGAACAATTTGCTGA6GG 
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CAG7GA0T*C73ACG3TACA£GGCAG;ACTATTATTGTCTGGTATAGTGCAACAGCAGAACAATTTGCTGAGGG 
7350 TOGO 7370 7880 7890 7900 7910 

1710 1720 1730 1740 1750 1760 1770 

CTAT rGAQGCGCAACAGCATCTGTTGCftfCJTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGiAATCCTGG 


l I ! i i l l l i l i 


l i • t J i i 1 i i 


t i i l i i i i l i i i i i i 


CCA' i fGAGGCGCAACAiSCATATGH' TGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAGTCCTGG 
T‘320 7630 7340 7950 7960 7970 7980 

1780 1730 1800 1810 1820 1830 1840 1850 

CTGTGSA AASAl A.CCT A AAGGi ATC AACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCA 
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CTGTGGAAAirATACCTAAAG^ATCAACAGCTCCTGGGGTTTTGGGGTTGCTCTGGAAAACTCATTTGCACCA 
7930 8000 3010 8020 8030 8040 8050 

1060 1370 1830 1390 1900 1910 1920 

CTGOTGTOCCT'i'GGAATGCT AST”! RGAGTAATAAATCTCTGGAACAGATTTGGAATAACATGACCTGGATGG 
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CTAT' f ETtXCT TGGAATGGTAGTTGGAGTA AT AAATCT CTGGATGAT ATTTGGAATAACATGACCTGGATGC 
8060 8070 8080 8090 8100 8110 8120 

1030 1340 1550 1360 1370 1980 1990 

AGTGl:iGACAi5Ai. ; AAAl TAACAAT7 ACACAAGCTTAATACATTCCTTAATTGAAGAATCGCAAAACCAGCAAG 
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AGTC.GGAAAia AGA AATTGACAATTACACAAGCTT AATATACTCATT ACT AGAAAAATCGCAAACCCAACAAG 
8130 3140 3150 3160 S170 8180 8190 8200 


2000 MO<0 2050 2030 2040 2050 2060 

AAAATAM'I I .'AACAA.6:AATTATTSGAATTASATAAATGEGCAAGTTTGTGGAATTGGTTTAACATAACAAATT 


AAAAtiiAATiTAAC:AA5.AATT 


1.AATTGGATAAATGGGCAAGTTTGTGGAATTGGTTTGACATAACAAATT 

8230 8240 8250 8260 8270 


2070 :M. . O 209O 2100 2110 2120 2130 

GGCT5T '5G : i ;-Ti >Y I'fi Ai- A AT A' i" fC A' f A4TG AT AGT AGRAGGCTTGGT AGGTTT AAGA AT AGTTTTTGCTGT AC 


i t t i t i t i i i i i i i i i i l i i i i l l i 


GGC'itrrGC-r.TT'T.T^AAMATATTCA.TAAIGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTGCTGTAC 
5580 323.' 8300 8310 8320 8330 8340 







oi do - -30 ;1‘i C-'O >21 TO 2180 2190 2200 2210 

TT'if /^‘.••"lY^&i'.-r^EGsGATATTC^CCATTATCGTTTCAGACCCACCTCCCAACCCCQAQQG 

1 ! * ^ ‘ * 1 ‘ ' * w " ‘ .. i i i i i t i it i i i i i i i i i i t 


TTTCTATAi /TDVV-VTAEAG: TARteCAESCr. 
8350 ti.-jSO 8370 


8380 


8390 


8400 


8410 


occo _;2TO 7240 7250 2260 2270 2280 

fi p ( •, rcnARNRAA'TARAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCCATTCGATTAGTGA 

: : ;TTTV,:;: : : ; :: : : ::: 

GACCOGACA£S r tlC 6 AMbiCiiAATCi 3 :AAt 5 AAGAAGGTGKaAGAGAGAGACAGAGACACATCCGGTCGATTAGTGC 
8420 p4;,o S440 3450 8460 8470 8480 

^>iO 2"OC 7"!0 2320 2330 2340 2350 

ttrPPATnn rTAC-,!: ACT fATCTGiK-iACGATCTGCGGAGCCTTGTGCCTCTTCAGCTACCACCGCTTGAGAGAC 

; "T :*:! : : i ; : :: 

ATGCv/VfTCT~! Y z n.CAATTATCTGGGi rCGACCTGiCGGAGCC—TGT I CCTCTTCAGCTACCACCAC AGAGAC 

8490 8500 Q5lO 3520 8530 8540 8550 

2360 23'’0 2330 2330 2400 2410 2420 

TTACTCTTf? A f'f u>TA ACQAEGATT GTGGAACTTCTGffSGACGCAGGGGGTGGGAAGCCCTCAAATATTGGTGG 

1 ‘' ’ .. , * . , . i . i i i i i i i i i i i i i i i i i i i i i t i i » i • » 


TTACTCTTi -lATT* -XAUCGAGlT:ATT GTRGAACTTGTGG 


8560 

3 ; j?0 

8580 

3590 

8600 

2430 

24 40 

2450 

2460 



ibiLa I LabilaRRia 
8610 


8620 


AATCTCC‘1 'ACjtV/TATTGEAGiTCAEGAACT AAAG 
i i :i i 

AATC”i'CCTACAG: fATTRGAGTCAGGAAC FAAAG 
8630 0540 3650 8660 X 


3. KUNZ-158- 

HIVCDC42 


lumen iTniiiunodsf iciency virus type 1, isolate CDC-4 
ui-.nrw:* ‘ v .nv^ loin ss-RNA VRL 15—JUN—1989 


LOCUS 

h;:v 

UDC42 3373 bp ss-RNA 

DEFINITION 

HU X 

res/ 

an itrimunodef iciency virus type 

eriv avid nsv genes. 

ACCESS X I IN 

HI 

1 37 

KEYWORDS 

env 

gene? tax cjenSi. 

SEGMENT 

2 G 

:f W 2 

SOURCE 

Hum--vn inr.nunodeficiancy virus type 
unintegrated circular proviral DNA. 

ORGANISM 

Hu'it 

Vi r 

iB-n i TfiiRUT iodef icieiicy virus type 
•idaei ss-RNA enveloped viruses» 


REFERENCE 

AUTHORS 

TITLE 


JOURNAL 

STANDARD 

COMMENT 


feature::. 

pept 

pep t 


pepx, ps 
P-'i ^ 

,y - 


Lent 1 vi r inae* 

1 (bases 1 to 5373) 

Deso.i M. s Ka i yanaraman»V. S. » Casey »J.M. » Sr i n i vasan * A. » 

Andersen sP„ R. and Devare i S= G. 

i.fi:nular cloning and primary nucleotide sequence analysis of a 
di.r4-.tnct hum,an immunodeficiency virus isolate reveal significant 

dl vergs-nos in its genomic sequences 

Pror> "NrntL ftcad. Sci. U» S. A. 83» 8380-8384 ( 1986) 

f U ; A S t in f f_rSV >. 3W 

Kindly o'Jht.rl ttad in computer-readable -form by Ell. The normal start 
codon -far the nef gene is not present? the ATG at 3142 may serve 

t. h i s rc.i l e, 

-f rorn tn/span dascr i pt i on 

( \ 5 P vpr prote 1 n» part i a 1 < A A at 2) 

3 £; 302 tat protein^ exon 2 (first expressed exon) 

2SS7 2757 tat protein? exon 3 (AA at 2668) 

227 302 rev/ protein? exon 2 (first expressed exon) 

7257 2?/-!-i rev protein? exon 3 (AA at 2669) 

3 i55b vpu protein (in—frame stops at bases 451 and 

484) * 

e±‘77 305x envelope polyprotein 

PIMft 



pr~—Tiisc'i ( 
I VS ” < 

IV3 
I VS 
I VS 

BASE COUNT 
ORIGIN 1 


> 3373 tat* rev, nef subgenomic mRNA 

i 34 tats rev, net subgenomic mRNA intron 1 

303 2686 tat cds intron 2 

303 2SSS rev cds intron 2 

303 2S£:G tats rev, net subgenomic mRNA intron 2 

.174 a 531 ; c 90S g SOS t 

upstream r.iv Er.oRI sits; about 3.6 kb after segment 1. 


Initial Scots' 
Residue Identity 
Gaps 


932 Optimized Score - 2209 Significance - 0.00 

Naccf.es - 2241 Mismatches = 196 

GO Conservative Substitutions = 0 


X 10 20 30 40 50 60 70 

AAGAGCAGAAGACAGTGGCAATGAGAGTGAAGGAGAAATATCAGCACTTGTGGAGATGGGGGTGGAAATGGG 

, - . , i , I , t • : I * : I : J t 1 t I : t t t i t i t > I i i I i t i i i it t i I . t i i t i t 

, i : ( i i i i t t i i i i t t i i i r t i t i t t t t i i t t i i < till it i i i i i i i i t i 

AAGAUAftiYAftuACfiAiGGCAATGAuiAGCGAAGG-GGATCAGGAAGAATTGT-CAG-CACTTGTGGAGATGGG 


470 

AHO 

490 

500 

510 

520 

530 

80 

^0 

100 

i 10 

120 

130 

140 


GCACCm'GOTC:: HtrATATTGATGATCTGTAGTGCTACAGAAAAATTGTGGGTCACAGTCTATTATGGGG 

, , , . i , • . : i : i j j ; : ; i ; t ; i t i t : i i i i i » I i i ill i i i t t t i ( i i i i I i i i » i i I i i t i I 

, , , , i ; ( i , ! : t i - i i ri i : . ; t r i i i » r l t t i i I t lit .I i ( l i i i I I I I I I I I I I ( i I ( I I 

GC ACC AT GOT C C f 7 GGAATG'f TGATGATCTGTAGTG-iCTGCAGCAAACTTGTGGGTCACAGTCTATTATGGGG 
540 550 560 570 580 590 600 

150 3. SO 170 180 190 200 210 

TACO’ f 1 ~ T R'"-G A AGG AAGCftACCAl fCRCTCTATTTTGTGCATCAGATGCTAAAGCAT ATGATACAGAGGTAC 

, , , { 1 : t I 1 l I i , ; I i t i t ! r I I t t i t : ' I » t I * t t I I I I I l I I I t t * I I t * ( » » 1 * • ' 1 1 ' 1 1 1 1 1 1 11 

i t i i i j i i i t i t i i ■ t t i i t l i I i » t : * : l * • i < i i i l l * • * * 1 1 * 1 1 * 1 1 * 1 * * 1 1 1 1 1 1 1 1 1 1 1 1 * 11 

1‘ACCTGTG Yi ;.~Y -AAGAAiiCAACCAi 1CAC f CT ATTTTGTGCATCAGATGCT AAAGCAT ATGAT ACAGAGGCAC 
G10 670 630 640 650 660 670 


220 230 240 5:50 260 270 280 

ATAA'iGTTTCYG'.'CCAC AC'TlGCCi GTGTACCCACAGACCCCAACCCACAAGAAGTAGTATTGGTAAATGTGA 

, , , . ; I * : . J , 1 . : 1 : i t i . r i i i i i 1 : I 1 t I i j I I t i I i t I I l i .. i i i i » t » i i i i i i i t 

, i , T i ; i i j : i ; [ : j t t t i i ! t i I i i i i i i i i I I • i i i i i i t i t t i i i i i i i i i i i i i i • * * ( i i i i i t i 

ATAA l GTT TC-SO 1CCACACATGCCT GTGTACCCACAAACCCCAACCCACAAGAAGTAGTATTGGAAAATGTGA 
eeo 690 700 710 720 730 740 


230 300 310 320 330 340 350 360 

qoiEAAAAT !';■:Y-tACAT7;TGGAAAAATGACATGGTAGAACAGATGCATGAGGATATAATCAGTTTATGGGATC 


t 1 1 I 1 t t t ! I 


CAGAAAATTTTAACATGTGGAAAAATAACATGGTAGAACAGATGCATGAGGATATAATCAGCTTATGGGATC 
750 760 770 780 790 800 810 


370 380 390 400 410 420 430 

AAAGCCTAAAG.rCATGTGTAAAATTAACCCCACTCTGTGTTAGTTTAAAGTGCACTGATTTGGGGAATGCTA 

, , t i i t i i r I i i I i i i 1 I i i t i t t i I (» i i I I I I I I i * » t I 1 i i I i i I I t I I 1 » t I * * J IJ 

, I I » 1 ; ( 1 t : t t ! : t I I 1 i i ! : i i l i ! i i i t i i t i I i i t f : I t t i i i t i i i i i t t t i i > i » t i 

AAAECCTAAASi:CATS7 GTAAAACTAACCCCACTCTGTGTTACTTTAAATTGCACTGATTTGAATACTAATA 
820 030 940 850 860 870 880 890 


440 450 460 470 480 490 500 

CTAATACCAATACTA—GTAATACCAATAGTAGTAGCGGGGAAATGATGATGGAGAAAGGAGAGATAAAAAA 


ill ii i ii 


ATACTACTAAT ACTAC f GAACTATCAATAATAGTAGTTTGGGAACAACG—GGGTAAAGGAGAAATGAGAAA 
300 910 920 930 940 950 960 

510 520 530 540 550 560 570 

CTGC f CT T T'CYY-Vr ATCAGCACAAGNATAAGAGGTAAGGTGCAGAAAGAATATGCATTTTTTTATAAACTTGA 

itt i i t i t t t i i t t ( t i l i i l t l I 1 I i I i i 1 I I > I I I I I i i I I l l I l I l I 1 I I 

lit j , i i i ; r i i t t t t J I I r I I i i i l I t i i l i i t i i t l l i t i i t 1 l l i t l i l i i I i I I I » I I i i i I 

CTG7TCTTTCAATATCACCACAAGCATAAGAGATAAGGTGCAGAGAGAATATGCATTGTTTTATAAACTTGA 
370 980 S3C 1000 1010 1020 1030 


580 

TATAATACf 74A I AG- 


530 600 

—ATAATGATACTACCAGC- 


610 620 630 

-T AT ACGTTGACAAGTTGT AACACCTC 


•rGTA 0 AACCAATP,GA 1 GATAATAAAAATACTACCAACAACACCAAATATAGGTTGATAAATTGTAACACCTC 

1040 i050 1060 1070 1080 1090 1100 




64< ■ 650 660 670 680 690 700 

ASTCAT!ACACARGCr-jSTCICAAASGTA rCCTTTGAGCCAATTCCCATACATTATTGTGCCCCGGCTGGTTT 

, t f , ■ i , • ] i - : i * : i ! t ; ; j : i i i t i I \ l t t ( t t t i l i t i i t i i i i i i i i t i t t i t i t I i i i l 

t t i • i t i ! t t I ‘ t i i I t I i ( ( t j ( : i t t t i t i i i t i t i i i i t t i i i > » < i • ... t i i » i ******* 

AGTCATTACACARC-CCTGTCCAAAGGTATCCTTTGAGCCAATTCCCATACATTATTGTACCCCGACTGGTTT 
11 1.0 5.120 1130 1140 1150 1160 1170 

710 720 730 740 750 760 770 

TGCGATTCTAAAATG1 AATAATAAGACGTTCAATGGAACAGGACCATGTACAAATGTCAGCACAGTACAATG 

l t i 1 < s > : < s i i t i t ■ i i i > i i i i i l i i l ■ ..Ill.. i i i i i l i l i l i i 

* , t i i l i r i i ; i t i t l i i i r t t i r t i t i i i I i i ( * • I i t I i i t t t i i i i t • I I l I I I I I i l < < i I 

TGCACTTC rAAARTGT AACSATAi'-.GiAAGTTCAATGGGACAGGACCATGTACAAATGTCAGCACAGTACAATG 
1180 1160 1200 1210 1220 1230 1240 

7 C;o 7S0 800 810 820 830 840 

TACACATSGAAYTA&GCCALiTASTATCAACTCAACTGCTGTTGAATGGCAGTCTAGCAGAAGAAGAGGTAGT 

t , , , t t l , t l t t i > 1 ; i : l t l l 1 l I t i t 1 t t t I l t i i t l I I * * * » 1 * * * * > »*»>*»* » 1 •*•»***** • 

, , * I t T t 1 1 1 t 1 | t t t 1 ( I | ( t 1 | | » 1 1 I | ! I I I t I I I I t t I I t t I I I I I I I I I t f I • 

TACACATGGAATTAGGCCAGiTAGTGTCAACTCAACTGCTGTTAAATGGCAGTCTAGCAGAAGAAGAGGTAGT 
1250 1260 1270 1280 1290 1300 1310 1320 

850 860 870 880 890 900 910 

AATTAGATCTGCCAATTTCACAGACAATGCTAAAACCATAATAGTACAGCTGAACCAATCTGTAGAAATTAA 

, ( ( , , , , , t • , t e i i t t < i I i : t I i c I t i t I I t i ... I I I i I I I I I I ) I I I I I I I t I 

iiiiitlllt’ i 1 I 1 I I t 1 i t I t 1 i * I i I t « t I I I • I I t I I i I 1 i * * * < t t( i i i i l i i i i l i * i 

AATTAGATCTGAAAATYTCACGAACAATGCTAAAACCATAATAGTACAGCTGAATGTATCTGTAGAAATTAA 
1330 1340 1350 1360 1370 1380 1390 

920 630 940 950 960 970 980 990 

TTGTACAAGACCCAACAACAATACAAGAAAAAGTATCCGTATCCAGAGGGGACCAGGGAGAGCATTTGTTAC 

i j i i i ; i ( i i » i t i i i i i t iirtiititiiit- tit i i i l i t l i t l t t t t l tl lit 

t ; t t i t i : i t : * i i t i i ! i i : ; : i i i t i i i t t i i i i i t i i i i i t t i i i i t it t t i 

TTGTACAAGACCCAACAACCATACAAGAAAAAG-GGTAAC—GCT AGGACCAGGGAGAGTATGGTATAC 

1400 1410 1420 1430 1440 1450 

lOOO 1010 1020 1030 1040 1050 1060 

AATAGGAAAAA-TAGGAAATATGAGACAAGCACATTGTAACATTAGTAGAGCAAAATGCAATGCCACTTT 

, , t ; . ; III 1 I f 1 t I t I I ! t I I 1 t 1 I I I l I I l t t I I i I t 1 i I I I I. Ill I I I I I I 

t , tilt ► t t t . | ( i ( t i i it I l l l I l I I l ■ I t I i I I > t > ■ I I t t I I I lilt III I I I I I I 

AACAGGAGAA47 ACT AbiGAAATATAAGGCAAGCACATTGT AACATT AGT AGAGCACAATGGAAT AACACTTT 
1460 1470 1430 1490 1500 1510 1520 1530 


1070 1080 1090 1100 1110 1120 1130 

AAAACAGIATAGCT AGCAAATTAAGAGAACAATTTGGAAATAATAAAACAATAATCTTTAAGCAATCCTCAGG 


I I t I 1 ! I t t I 


ACAACAGAYASCfACAACCTTAASAGAACAATTTGG-GAATAAAACAATAGCCTTTAATCAATCCTCAGG 

1540 1550 1550 1570 1580 1590 1600 


1140 U50 1160 1170 1180 1190 1200 

AGG-Gt TACCC AG AA ATT LiT AACGCACAGTTTT A ATTGTGGAGGGGA ATTTTTCT ACTGT AATTC AACACA ACT 


i i 1 t i i i i i i 


AGGGGACCCAGA.A ATTGT AATGCACAGTTTT AATTGTGGAGGGGAATTTTTCT ACTGT AATTCAACACAGCT 
1610 1620 1630 1640 1650 1660 1670 

1210 J.220 1230 1240 1250 1260 

GTTTAATAGTACTTE-GTT-TAATAGTACTTGGAG-TACT—GAAGGGTCAAATA—ACACTGAA 

i , t t : i :ii t t : t i t i i i i t ■ ■ i i t t i ttt i till i i i t i i t 

titittiit >iii > t i i t . t i t > i t i t > t i t t t i tit i i » i i i i t t » i i 

GTTTAAT AGCGCTTGSAATGTTACTAGTAATGGTACTTGGAGTGTTACTAGAAAG—CAAAAAGACACTG— 
1680 1690 1700 1710 1720 1730 1740 

1270 1250 1290 1300 1310 1320 1330 

GGAAGTGACACAATCACACTCCCATGCAGAATAAAACAATTTATAAACATGTGGCAGGAAGTAGGAAAAGCA 

t tttt iittitititiittttiiiiittiiii ttiiiiiit itititii itiiiiiiiiii 
i itii i t t t i t t i i i i t i i i • t t i i i i t i i i i i i t i i i i i i i t » i t i i i i i i i i i i i i i i 

-BAGACAT T ATCACACTCCCATGCAGAATAAAACAAATTATAAACAGGTGGCAGGTTGTAGGAAAAGCA 

1.750 1.760 1770 1780 1790 1800 


1340 1350 1360 1370 1380 1390 1400 

ATC-Y 'fGCCCCTCCCA fCAG'CGGAC AA ATT AGATGTTC ATCAAAT ATT AC AGGGCTGCT ATT AAC AAGAGAT 

, f , i • | j i i ; t 1 t ! t t I t 1 I t i t I I I I I t I I I t I i i t t I I I I I I I 1 I t I I 1 I l I I I I I I I t i * 1 I I 

t r t t t t i i i t f t tttt > i ■ i t 1 i t t i i t i t l l i l i l i i i i i i t t t t t i l i l l i * l » i i i i i i 

ATGTATGCCCTTCCCATCAAA66ACTAATTAGATGTTCATCAAATATTACAGGGCTGCTATTAACAAGAGAT 
1810 i32C 1030 i340 1850 1860 1870 1880 




1410 1420 1430 1440 1450 14G0 1470 

GGT6GTAATAACAACAATEGGTCCGAGA' fCTTCAGACXTGGAGGAGGAGAT ATSAGGGACAATTGQftGftAQT 

i i i i i : i i i i i i i i t i i i t i i i r i i i t i i < t • i t i i i t i i i r i i i i i t i i i < i i i i i t > 

i i i i i i i lit i t i i i t « t * : i i » i t t t t t i i i t i i t i i t t i i i i i t i t i i i i i i t i i i i i 

GGTGfvTGG:TGP,eAftCCftGftCCACCRftSATCTTTftGACCTGeASGAGGAGATATGAGK3GACAATT6GftGiAAGT 
1630 1300 1310 1320 1930 1940 1950 

14S0 3.450 1500 1510 1520 1530 1540 1550 

GAAYTATATAAAfATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTG 

i ■ t i i j i i : i i t i i ’ t j i : : i » : i r i i i i t i i t i i t i i i t t i i t i i i i i i i i t i t t l i i l i i i t i t t l 

i i i i i i i i i i j : i i i i i : i i t t t i i i i i « t i i i i i i i i t i i » i t i i i i t t i i i i i i * t i l i ( i i i i i t t i t 

GAATTAT AT AAA TAT A AAGT AGTA AAAATCGAACCATTAGGAGTAGC ACCCACCAAGGCAAAGAGAAGAGTG 
iSGO 1370 1980 1990 2000 2010 2020 


1560 1570 15S0 1590 1600 1610 

GTGCAQAGAGAAAAAAGAGCAGTC-ir-.lGAA-T AGGAGCTTTGTTCCTTGGGTT CTTGGGAGCAGCAGGAAGC 

i i i t p i i t i t i t i t t t i ( i t r i i t t t i i i < i i i i i i i i i i t i i i i i t i i t t i t ■ • t i i i t i i i t i i ■ i 
t i i t t i i i i t i i : t i i i t i i l t t t i i i i t i 1 i i i i t i i i i t i i t i i i I i i i i t ( i i ) I t I i i i i I i i i 

GTGCAGAGAGAAAAAAGAGCAGTGGGAATGCTAGGAGCTATGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGC 
2030 .2040 2050 2060 2070 2080 2090 


1620 1830 1640 1850 1660 1670 1680 1690 

ACT A' rGSGCGCACGGTCAATGACGCTGACGGT ACAGGCCAGACAATT ATTGTCTGGT AT AGTGCAGCAGCAG 

i t r t i i i i : i t i i i r t i t t t i > i i i i i t i t t i t i i i t t i i i i i i i i t i t » t t t t t i t i i i t i i i i t i 
I | i i t i i i t j t i t t i ; i t t t i > t t i i f t i i t t i i i t l i i i i i l t i t t t t i t i i l i l t i i i i i i i 

ACT ATGGiGCGCAACGTCAATGGCGCTSAGGGT ACAGGCCAGACAATT ATTGTCTGGT AT AGTGCAACAGCAA 
2100 215.0 2120 2130 2140 2150 2160 


1700 f710 1720 1730 1740 1750 1760 

AACAATTTGCmAGGGi rrATTGAGi-iCGC'AACAGCATOTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTC 

i i t t i ! i i i : ; i i i j i i i : t i :: i : i i i t i i i i i i t t t i i t t i i t i i i i i i i i i i i i i i i i t i i i i i 
I t t t i : i t t * i i i t i I I I ! i 1 t ; i i I I t I I I i l I 1 ■ I I t t t » I t I I I I I I i I 1 l i I I I I I i t i I I t I i I 


AACAAT 

2170 


'TGCTGAGAGCTATTAAGGCGCAACAGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTC 
8130 2190 2200 2210 2220 2230 2240 


1770 1780 1790 1800 1810 1820 1830 

CAGGCAAGAATCCTGiGCTGTGGAAAGATACCTAAAGGATCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGA 

i : i ; t ; : ; r : ! i : i : : : i i i i * i i i ! i ! ) i I i t i i t i i 1 i i 1 1 i I » i I ill I ■ > I t i l i i i I I l I t l i 
| ) | j | t J t l ! ! ! I t 1 I 1 t L I 1 1 I I I I I I I I I I t I t I I t 1 t I I 1 « I 1 I * I * t t I I I t I t I 1 I I I 1 I t I 1 I ) 

CAGGCAAGAATCCTSil CTGTGGAAAGATACCTAAAGGATCAACAGCTCCTAGGGTTTTGGGGTTGCTCTGGA 
227.0 2260 2270 2280 2290 2300 2310 


1840 1050 1860 1070 1880 1890 1900 

AAACTCATTTGCACCACTGCTG7GCCTTGGAATGCTAGTTGGA6TAATAAATCTCTGGAACAGATTTGGAAT 

t i i t I t i t t [ t 1 1 i . : . r i i 1 i : i : : i t l i 1 t i t ( I l l I 1 t l ! I I l I l l I 1 i I I I t t I I I I I I t I I I I I I 

i i i ,* i i i ' t t i : i i ; i ! j i i t : : i : i i i i t t t i t i t i i i t i t i i i i i t i t i i i t i i i i i t i i i t i i i t i i 

AAACTCATTTGCACCACTGCTGTGCCTTGGAATGCTAGTTGGAGTAATAAAACTCTGGATCAGATTTGGAAT 
9320 2330 2340 2350 2360 2370 2380 


1910 1920 1S30 1940 1950 1960 1970 

AACA rGACCTGCrATGGAGTGGGACA5AGAAATTAACAATTACACAAGCTTAATACATTCCTTAATTGAAGAA 

i i t i * ; i i • t t i t t t i i t i i t i t t t i i i i i t t * i t t > i t i l f t » » i t i i i l i t i i i i i i i t t i i i 

i i i ; t r t : i * i : t r i ! t I l i 1 I i I 1 i i I t t l i i ittttiilttt I i t i I I I I i I t i i i I I I l I t i 

AACA. rGACCTGGATGGAGTGGGACAGAGAAATTGACAATTACACACACTTAATATACACTTTAATTGAAGAA 
2330 2400 2410 2420 2430 2440 2450 


1980 1930 2000 2010 2020 2030 2040 2050 

TCGCAAAACCARCAAtvAAAAGAATGAACAAGAATTATTGGAATTAGATAAATGGGCAAGTTTGTGGAATTGG 

i , l • t t i r t t : i i ! i t : t i : t i i i < i III t t i i t i i i t i t I t l i i i i i i > I t I i i t I i i i lilt 
i i t t i i i i t i i i • t i i t t r r t i : till lit i i i i i t i i i i i i i i i i i i i i i i i i ( i i i i i i lilt 

TCGCAAAACCAACAAGAAAAGAA 7 CAAUAGGAACTATTGCAATTAGATAAGTGGGCAAGTTTGTGGACTTGG 
2460 2470 2430 2490 2500 2510 2520 


2060 2070 2080 2090 2100 2110 2120 

TTT AACATAACAAATTGGCTGTGGTATATAAAAATATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGA 

i j t : i t : i i i i f t : i * t 1 t : • : i i i i i i i t r i i : i i t t i i » * * • i » * i * » i i > * 1 » i I i I I I I i i I i 

i i i i i • t i ; i i t i i i t : • i t : t t i i t l i t i t i t t : i i i t i l i i l i ■ i i l i > i l t i l l I I i I i i i t I I i 

TCTGACATAACAAAATGGCTGTGGTATATAAAAATATTCATAATGATAGTAGGAGGCTTGATAGGTTTAAGA 
2530 2540 2550 2560 2570 2580 2590 2600 


2130 2 5.40 8150 2160 2170 2180 2190 

AT AGTTTTTGCTGT ACTTTCTAT ALTTGAAT AGAGTT AGGCAG6GAT ATTCACCATT ATCGTTTCAGACCCAC 

i t t t i : i t i i f i i i i : i l ; i i ! i i i i t t i i i t i t i i i t i i I ! i i i i l t i t i I I I I i l i I I i i t l I I t l 
i i i i l i r ! i t j t i i t i t i t ; t t i i i i i t i t t i i i i i i : t i i i i i i i i i i i i i i i t i i t i i i i i i i i i i i 

ATAuP.TTTfGCi GTG5TTTCTATAGTSAATAGAGTTAGGCAGGGAT ACTCACCATTATCGTTTCAGACCCTC 
2810 8820 2630 2640 2650 2660 2670 






ICAACCCCf :RACCCG Ai7>GSCCCGAAGGAATAGAAG!flMBPHPJM|^^^^^^^^^^^^B 

CTCCCAAACCCGAGGGRACCCGACAGGCCCGAAGEAACCGAAGAAGGAGGTGGAGAGAGAi^^^^^^^M 
2680 2690 2700 2710 2720 2730 

2270 2200 2230 2300 2310 2320 2330 

TCCATTCGATTA.GTGAftCGGATCCTTAGCACTTATCTGGGACGATCTGCGGAGCCTTGTGCCTCTTCAGCTA 

, - t , , i , t ; t : • i t i r i t i t t t i t i i i i i i » I i t t i i i * t i i i i i i i i t i t i I i t i t I i i i 

, , , t ; « i t i : i < : i i i C i r i I i i i i i ) i i i i i i t t l l < i t t i t i i i i ) l i l t t .Ill 

TCCACTCGATTAETGCATGG.CTTCTTAGCACTTGTCTGGGACGATCTGCGGAGCC-TGTGCCTCTTCAGCTA 
2730 27S0 2770 2780 2730 2800 2810 

2340 2350 23S0 2370 2380 2390 2400 2410 

CCACCGCT TGAGAGACTTACTCTTGATTGTAACGAGGATTGTGGAACTTCTGGGAC6CAGGGGGTGGGAAGC 

, , j . i t * i : f i » i t : i t j i i i i i f i i t i t t i t ! t i i t t i t t i i t i i i i i i i i t i i * * i • * .. 

( j t J 1 t | ,( ) | t I I t t t I • > I I I I 1 I > t I ■ < 1 * I I ' ■ ' 1 1 > 1 1 ' 1 1 1 1 1 

CCACCGCTTGALf-iGACiT ACTCTTGATTGT AGCGAGGATTGTGGAACTTCTGGGACGCAGGGGGTGGGAAGT 
28;-:0 2830 2840 2850 2860 2870 2880 

2420 2430 2440 2450 2460 

CCTCAAATAT T GGTGGAATCTCCTACAGTATTGGAGTCAGGAACTAAAG 

t | | ; l t l l l I ( * t t t t l I l l t l l t t t | ) ( | i | [ I i t l l l t I l I l l l * 

I I I t I I 1 I tlllllilllllllilllfttlll 

CCTCAAATATTGGTGGAATCTCCTGCAGTATTGGAGTCAGGAACTAAAG 
2830 £300 2310 2920 2930 X 


4. KUNZ-15S-CL32. SET! 

HIVPV22 Human imrnunodoficiency virus type 1, isolate PV22* 


LOCUS 

DEFINITION 

ACCESSION 
KEYWORDS 


SOURCE 


ORGANISM 


REFERENCE 

AUTHORS 

TITLE 

JOURNAL 

STANDARD 

REFERENCE 

AUTHORS 

JOURNAL 

STANDARD 

COMMENT 


H1VPV22 9770 bp SS-RNA VRL 15-JUN-1989 

Human immunodeficiency virus type 1. isolate PV22, complete genome 
tH9/HTLV— 1 1 1 proviral DNA) , 

K020S3 

TAR protein; acquired immune deficiency syndrome; complete genome; 
eriv gene; gag gene; long terminal repeat; pol gene; polyprotein; 
pre-virus; rev gene; reverse transcriptase; tat gene; 
trevis~ act i vqT. or- 

Human imiTiiTiunodsf iciency virus type 1 <HIV— 1 > * isolate PV22 (-from 
H3- der i wed fam i 1 y) » prow i ra 1 DNA. 

Human immunodeficiency virus type 1 

V:U. idec5 ss-RNA enveloped viruses; Retroviridae; 

Lent ivi rinse. 

1 (bases 1 to 3770; revised sequence- personal communication) 
Muesing-M. A, , Smith -a H. . Cabradi 1 la.C. D. , Benton, C. V. , 

Laoky ,L. A. and Capon* D. J. 

Nucleic acid structure and expression of the human 
AI IT 3/1 y-r.phadenopat hy ret rov i r us 
Nature 313, 450-458 (1985) 
f u11 s taf f_ rsv1cw 

2 (bases 21li to 2112; revises C 1 3 > 

Mu?;s i ng»M. A. 

Unpublished (1937) Whitehead Inst Cambridge, Mass 
full staff __r a v i e w 

This sequence for a H9/HTLV-III virus was determined from one 
complete proviral clone El 3. Additionally, several cDNA clones of 
tfie viral RNA were sequenced for comparison with the entire 
proviral sequence. The differences between cDNA and proviral DNA 
ere extensive and are listed in the Sites Table as variations. The 
authors believe that the variations may be due in part to different 
strains irt the H9/HTLV-III cell line, because it was established by 
infection with material from several AIDS patients. 

V/i th the addition of g at 2111, gag cds and pol cds are very close 
to those of HXD2v BRU, and related HIV viruses. 

For details and other references pertaining to Sites and Features, 
see tine HIV reference entry. 

from to/span description 


FEATURES 






pept < 

2094 

5141 

pol polyprotein <NH2-terminus uncertain; AA 




2084) 

p3p t 

5CS£i 

5684 

vif protein 

pept. 

560’-: 

5840 

v/pr protein 

pept 

5876 

SOSO 

tat protein* exon 2 (first expressed exon) 


8421 

34E;G 

tet protein? exon 3 (AA at 8422) 

pept 

SO 15 

8060 

rev protein* exon 2 (first expressed exon) 


8421 

13695 

rev proteino exon 3 (AA at 8423) 

pept 

84 07 

S352 

v/pu protein 

pept 

E2b ( 

8057 

envelope polyprotein 

pept 

as3& 

9453 

nef protein 

pre-msg 

4B4 

9678 

genomic mRNA 

pre—msg 

464 

9673 

tat* rev* nef subgenomic mRNA 

XV? 

753 

5822 

tat* rev/* nef subgenomic mRNA intron 1 

I VS 

6081 

8420 

tat cds intron 2 

I VS 

8091 

8420 

rev/ cds intron 2 

I vs 

609 1 

8420 

tat* rev* nef subgenomic mRNA intron 2 

LTR 

10 

643 

5’ LTR 

LTR 

3 i 2S 

STS 1 

3’ LTR 

rpt 

4S3 

5S0 

R. repeat 5’ copy 

rpt 

9581 

9673 

R repeat 3’ copy 

b i nd i ng 

33c- 

395 

Sp1 binding site III 

bi'ndi nc; 

397 

406 

Sp1 binding site II 

b i rid A ng 

400 

417 

Spi binding site I 

b i r:d i ng 

646 

GG2 

primer (Lys-tRNA) binding site 

variant 

510 

510 

a in provirus? g in cDNA C13 

v/ar ant 

5175 

575 

g in provirus; a in cDNA Cl 3 

rev/1 eion 

O 1 1 1 

21 1 2 

gg in C2 3 5 g in C13 

variant 

5718 

5716 

g in prov/irus* a in cDNA C13 

variant 

5992 

5992 

a in provirus ; g in cDNA C13 

variant 

GCO’V 

6007 

c in provirus? t in cDNA [13 

v/ar i ant 

5047 

GO-47 

c >. n prov i russ gin cDNA C13 

variant 

605 3. 

6051 

c in provirus; a in cDNA C13 

variant 

E3055 

6057 

agg in provirus; gaa in cDNA C13 

v/ar' i ant 

6108 

6108 

t in provirus; c in cDNA C13 

var a ant 

51 20 

6120 

a in provirus; c in cDNA C13 

variant 

6125 

612G 

gc in provirus; gtaac in cDNA C13 

vs? iant 

G136 

613G 

a in provirus; c in cDNA C13 

variant 

6235 

8235 

t in provirus ; a in cDNA C13 

var Jant 

6352 

6352 

g in provirus sain cDNA C13 

variant 

6760 

6760 

t in provirus; a in cDNA C13 

variant 

7050 

7060 

c in prov/irus? t in cDNA C13 

vaviant 

7100 

7100 

a in provirus ) g in cDNA C13 

variant 

7134 

7135 

ca in provirus; ac in cDNA C13 

ve/r i crvnt 

7103 

7104 

gt. in provirus* aa in cDNA C13 

variant 

7199 

7133 

a in provirus; g in cDNA C13 

variant 

72S4 

7235 

aa in provirus? gc in cDNA C13 

\/s ,r i ant 

7303 

7303 

© in provirus ; c in cDNA C13 

v/ar i ant 

751 A 

75 1 1 

a in provirus [13; c in cDNA C13 

var ..ant 

7333 

- lt 7 

t in provirus Cl 3; a in cDNA C13 

var i ant. 

758G 

7506 

c in prov/irus Cl 3; t in cDNA C13 

var i ant. 

7643 

7648 

a in prov/irus Cl 3* g in cDNA C13 

v/ar i ant 

8139 

8136 

a in provirus; c in cDNA C13 

variant 

o *; t . 

t v A -r_J 

3143 

t in provirus? c in cDNA C13 

variant 

8222 

8222 

g i n prov/ i rus; a i n cDNA 113 

v/.av i ant 

8269 

8269 

a in prov/irus 11 3 ; g in cDNA C1 3 

va'; • i ant 

3285 

8285 

g in provirus C13 ; t in cDNA 113 

variant 

8376 

8376 

a in provirus Cl 3; g in cDNA C13 

v/ar- i ant 

8381 

838 1 

a i n prov/ j. rus 1 13 ; g in cDNA C 1 3 

variant 

3476 

8478 

a in provirus C13 ; g in cDNA C13 

va»' i ant 

oSB9 

nnr.n 

a in provirus C13 ; g in cDNA C13 

veviant 

8975 

3979 

c i n provirus * tin cDNA C13 

variant 

8950 

8990 

a i n prov/ i rus ? c i n cDNA [13 

var- iant. 

r't—i.— r-t 

8969 

c 1 n prov1rus C13 ; a in cDNA [13 

var Jant 

903 I 

903 1 

a 1n provirus [13; g in cDNA [13 







va v 4 a*nl. 

9235 

9295 

g in provirus C13 ; 

t 

in cDNA 

[13 

vaviant 

9303 

9303 

g in provirus C1 3; 

a 

in cDNA 

C13 

var ). ant 

8548 

S54S 

g in provirus [13? 

c 

in cDNA 

[ 13 

s i gne. l 


3659 

mRNA polyadenylation 

signa1 


prov 

•t rj 

9761 

HIV-1 provira1 DNA 




ce 1 i 

i 

9 

human cellular DNA 




cel t 

9762 

3770 

human ce11u1ar DNA 




: COUNT 

3439 a 

1786 c 

2376 g 2172 t 





ORIGIN 


b'o uostream o-F Ball I site. 


Initial Score = 1877 Optimized Score = 2190 Significance « 0.00 

Residue Identity = 8955 Matches = 2259 Mismatches = 153 

Gaps = HO Conservative Substitutions = 0 

X 1 O 20 30 40 50 60 

AAGAG-CAG-AASACAGTGGCAATGAGAGTGAAGGAGAA-ATATCAGCACTTGTGGAGA-TGGGGGTGGA 

. , , I T 111 I I I t I I 1 » < III. 1 I 1 I 1 I I I I I 

,,,,,,, t I I ttt til I J I I I I 1 1 I I I I I I I I I 1 I I I I I 

AATAGACAGGTT AAT1'GATAGACT 'AATAGAAAG—AGCAGAAGACAGTGGCAAT-GAGAGTGAAGGAGAA 

6220 6230 6240 6250 6260 6270 6280 

TO SO 30 lOO 110 120 

AATGG6GCAC—CATGCTCCT fGREATATTGATG-AT-CT—GTAGTGCTACAGAAAAATTGT—GGGT 

, i t i t it i t t i i titt i i i i i i i » tt i (iiit 

, i i r i it i t t t i iii: till iiit ii i i i i t t 

ATATCAGCACTTGTGGAGATGGGGGTGGAGATGGGGCACCATGCTCCTTGGGATGTTGATGATCTGTAGTGC 
6290 6300 8310 6320 6330 6340 6350 

130 140 150 160 170 180 

CACAG-TCFATTATEGGETAC-CT-GTGTGGAA-GGAAGCAA-CCACCA-CTCTATTTTGTG 

, , , , , , , i i j i t i it i i i i i .. i t i i i t i t i i i i i 

, , , | tit I ! 1 I II It It II I (till II I I I I I I II I II II 

TACAGAAAAATTGTGC.cn CACAGT CTATTATGGGGTACCTGTGTGGAAGGAAGCAACCACCACTCTATTTTG 
6360 6370 6320 6390 6400 6410 6420 

190 200 210 220 230 240 

CATCAGATGCT AAAGCATATGAT A-CAGAGG-TACATA-AT—GTTTGGGCCACACATGCCTG—T 

t i < t t i lit ii ii 1 i till i it ii ii * i i i i i ii i 

lit i it i t i : t ii t i titt t it *i * * i i i i i i ii i 

--ATCAGA7 GCTAAAGC-AT ATGAT ACAGAGGT ACAT AATGTTTGGGC-CACA—CATGCCT 

6430 6440 6450 6460 6470 6480 

250 260 270 280 290 300 310 

GT ACCCACASfi-CCCCAACCCACAAGAAGTAGTATTGGTAAATGTGACAGAAAATTTTAACATGTGGAAAA— 

II | || I It) 1 II ! I tit till lit II I till t III I I I I 1 

II I t i : lit l it It til I I I I til It I I I I I I III t I I I I 

GTGTACCCACAGACCCCAACC—CA—CAAGAAGTA-GTA-TTG-GTAAAT-GTGACA-GAAAAT 

6490 6500 6510 6520 6530 6540 

32.0 330 340 350 360 370 

-ATG:ftCA7 GGTKl.lAACAGATG- CA l GAG;GP.TATAATCAG—TTTATG—GGATCAAAGCCTAAAGCCATGTG—T 


TT"! A.ACAA --3TGGAA- AAATGACA1 
6550 6560 


- GGT AGAA—CAGATGCATGAGGATATAA-TCAGTTTATGGGAT 

6570 6580 6590 6600 


380 390 400 410 420 430 440 

AAAA—T7 AACCXCACTCTCiTGTTAGTTTAA—AGTGCACTGATTTGG GGAATGCT ACT AAT-ACCAA 

lit i ; : tit tiitii i t i i iiii iti i i i i t i i i i t ii 

ii- ttt ill t t t t f t i i i i i i i i III i i i i i i i i i i ii 

CAAAGCCTAAAGCCA-TGTGTAAAATTAACCCCACTCTGTGTTAGTTTAAAGTGC—ACTGATTTGAAGAA 

6610 6620 6630 6640 6650 6660 6670 

450 4GO 470 480 430 500 510 

TACTAGTAATACCAAT AGTAGTAGCGGGGAAATGATGATGGAGAAAGGAGAGATAAAAAACTGCTCTTTCAA 

, , , i t i i i i « i i t i i i i i t i i < i i t . i i i t i i i i i i t i i i i i i i t ■ i i i •• • i • i i i < ■ • > 

I , [ 1 I I ; 1 I I I I f I I I I I i 1 • ! I : I I t 1 I 1 I | I I I I | | 1 t t 1 I I I I I 1 I I I I 1 1 I I I I I 1 1 t t I I 1 

TGAT ACT A,AT ACCAATAGT AGT AG'CGGG AGA ATGAT AATGGAGAAAGGAGAGAT AAAAAACTGCTCTTTCAA 
6680 6830 6700 6710 6720 6730 6740 


520 530 540 550 560 570 580 

TATC AGCACA^C-'N ATAAGAGGT A6.GGT GCAGAAAGAAT ATGCATTTTTTT AT AAACTTGAT AT AAT ACCAAT 


l j i i i i i ; t 1 i i i i t t I i i I ; t i : i i t l I i t l i l l l l l t i l l i t i 
l j t i t > i i i * i I t i : t : ' t i t i i t i t t i t i i i i i i i i t t i t i I t * i 

TQ~rr:/ 'P;f 'y.y.',r- .I :&TP AFS4Rt?TA&Rrr.TreCARAAARA/vrQTrtr’ATT.TIl 


1 I 1 I I I 1 I 1 I I 




II.,, |,!J, f | , . 1 , 1 , I ■ 1 ; !, JMw I I wll M 1 , I . II I 1 1 .w'-Jl il > I 

6750 6760 6770 B780 6790 6800 6810 

590 900 610 620 630 640 650 

AGATAATGATACTACCAGCTATACGTTGACAAGTTGTAACACCTCAGTCATTACACAGGCCTGTCCAAAGGT 

t i i < t i t t i i ; i i t i t t i t t i i i i t i t i i i < i i i i i i i t i t t i i i i t i i i t i i i i i i i i i i i t i i t .. 

t t ; i i ; t t : i : : t i ? t : i : t : i ' i i i t t t t i i < i i i t i i i < « l t < t t t ■ i i i i i i i i i t i i < t i • • i i l i i i 

AGATAATGATACrACCAGCTATACGTTGACAAGTTGTAACACCTCAGTCATTACACAGGCCTGTCCAAAGGT 
6820 6330 6340 6850 6860 6870 6880 

660 670 SCO 6S0 700 710 720 730 

ATCC: f FTCT-iCCCOAT i CCCATACATTATTGTGCCCCGGCTGGTTTTGCGATTCTAAAATGTAATAATAAGAC 

I | 1 I t t I t ! I 1 1 * t ) t 1 I t t I 1 t t I t 1 I t I I I I I I I t I I I t I I 1 I I > I > t I I I I I t I I I I I I t I I I I I I I I I 

t I I 1 ( t I ! 1 I ! I ! t 1 I t 1 I I I ! 1 I I I t I I r I 1 I I I ! I t I I I I t • » 1 * t • ■ * 1 * l I I I » I * • * * t • • 1 I t I I 

ATCCTTTGAGCCAATTCCCATACATTATTGTGCCCCGGCTGGTTTTGCGATTCTAAAATGTAATAATAAGAC 
6890 6900 6910 6520 6930 6940 6950 6960 

740 750 760 770 780 790 800 

GTTCAATGGAAGAGGACCATGTACAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTATC 

i i t t i i i i i i i i i i i i t t i i t i i i i i i i i i i i i i i i i i i i t ( ... i i i » t t i i i i i < i i i i i i t i i » t 

i t t t t t : t ; i t i t t i : : t t t i t i ; i ; ■ i i t i i ■ i i t i ( i t i i i i > i t > i < > i • > i ■ i < < ■ ■ i < • * < t i i i i i 

GTTCAATG.TAACAGGACCAT GT AC AAATGTCAGC ACAGT ACAATGT ACACATGGAATT AGGCCAGT AGT ATC 
6970 6980 6990 7000 7010 7020 7030 

810 820 830 840 850 860 870 

AACTCAACTGC': G7T9AATGGCAGTCTAGCAGAAGhAAGAGGTAGTAATTAGATCTGCCAATTTCACAGACAA 

i ■ i r i t ? I i 1 ( i i t i i i i i t : t i t : i i i i i i t > I i 1 t I i i t i t * i i i t i i > » I ■ ■ ' ■ ( t i I i * I t i i i i I i 

i I I I t I I : 1 * • t i i ■ ? i i t t 1 1 ! t i i i t i i : I t i > i i i I 1 » I s i i ..I i t I I I ) I t I i I I I i 

AAC’i C AACTGCTGTT A A AT6GCAG fCT AGC AG AAGA AG AGGT AGT AATT AG ATCTGCC AATTTCACAGACAA 


7040 

7050 7060 

7070 

7080 

7090 

7100 I 

880 

330 900 

910 

920 

930 

940 1 

TGCTAAAACCrYi 

AAT AGT ACAGCTGAACCAATCTGT AGAAATT AATTGT ACAAGACCCAACAACAAT ACAAG 1 


i i ; i i t i i i ; i i < i t . t t i ; i i ! ; i i : f t t i i t i t ; : i t i t i i i i i i ■ t i i t ■ t i ( i i i i i i i i i i ( i i < i i 

i i t : ( t : i i t : > i ■ ) r ( i i r t s t t i i i i t i i i i i : t i i i i » ( i t i i i t t « i t i i i t i i ■ i i t * i i i i ■ i i i i 


TGCTAAAACCATAATAGTACAGC1GAACCAATCT6TAGAAATTAATTGTACAAGACCCAACAACAATACAAG 
7110 7120 7130 7140 7150 7160 7170 

550 960 S70 930 990 lOOO 1010 

AAAAAGTATCCCT ATCCAGAGCGGACCAGGGAGAGCATTTGTT ACAAT AGGAAAAAT AGGAAAT ATGAGACA 

i t i t * i i t t • i i i i i i i i i i i i t i i i t t • i i i t t ■ i i i t t t i t i .. i i i ■ t t t i i i i i i t i i i i 

i t t t > > t : > I t t i i t t i > I i t i t ( | i i I i t t I I 1 t t t t I I 1 » I t t I I I > I I i I I I I I t I I > I I I f I I I t I I 

AAAAAGTATCCuTATCCAGAGABT.ACCAGGGAGAGCATTTGTTACAATAGGAAAAATAGGAAATATGAGACA 
7180 7i80 7200 7210 7220 7230 7240 

1020 1030 1040 l050 1060 1070 1080 1090 

AGCACATTGTAACAT'l AGTAGAGCAAAATGCAATGCCACTTTAAAACAGATAGCTAGCAAATTAAGAGAACA 

t i i i i t i r t i ! : t t t I I i i i i i i i i t i 1 i III ( I I I I I I I i i I I I i i i I I I i i i i I I i I i i i i i i I I 

I F ( : i I i i : t t i i i I i .. ... ill i i i i i t i i i I I i t i i I i i i I I t i i t l i i i i t i i i f 

AGCACATTGTAACATT AGTAGAGCAAAATGGAATAACACTTTAAAACAGATAGATAGCAAATTAAGAGAACA 
7250 7260 7270 7280 7290 7300 7310 7320 

1lOO 1110 1120 1130 1140 1150 1160 

ATTT liE ; AAAl AATAAAACAA VAATCTTTAAGCAATCCTCAGGAGGGGACCCAGAAATTGTAACGCACAGTTT 

t t t * i » » i i t * i * t i i t t t i t t t i i i i t i t i i i : t i t i i i i i i i i i i i i i i » t t i i t 1 i » i i i i i i i i i i ■ 

i j i i t i i t t i ' t i t : • t l i t t t i t i i t t ■ ) i i i l i i i l i ■ t t i i i i i l ) i i l i i i i i i i i l • ■ i i i ) * l l t i 

ATTTGGAAATAAIAAAACAAFAATCTTTAAGCAATCCTCAGGAGGGGACCCAGAAATTGTAACGCACAGTTT 
7330 ’7340 7350 7360 7370 7380 7390 

1170 1130 1190 1200 1210 1220 1230 

TAATTGTGGAGGUGAAiTT f TCTACTGTAATTCAACACAACTGTTTAATAGTACTTGGTTTAATAGTACTTG 

i t j t i i i t i i i i i t : i i i i i t i i : i i i i i i i i c i i i i i i » i « i i • i i * i I » * I I i I i * * * • ' 1 i I I • t I I I * 

i r t i I I I i i I t i t i l i i i i i i t l i l i i l i t l t l i i i i * i * ' * * * * ■ > 1 * > 1 * ‘ 1 * ' * 1 1 1 ' ' • 1 * ' • 1 

TA'-rrrGTGGAGGGGAATTTTTCTACTGTAATTCAACACAACTGTTTAATAGTACTTGGTTTAATAGTACTTG 
7400 7410 7420 7430 7440 7450 7460 

1240 1250 1260 1270 1280 1290 1300 

gagtactgoagggtcaaataacac.tgaaggaagtgacacaatcacactcccatgcagaataaaacaatttat 

i i I i i ! i ? t t i » i ' * t ! i j t ’ 1 ; i t i i i i I t I > i t s i t t I I i t t i i t t 1 t 1 1 1 I I I 1 1 I I i i l I I i i i t t i t 

i i i ) 1 i t l i i t i i i i i t t i » * i i t t » i i i i i i i i i i i » i i i * i i i i * * * > » * * • < * > * » * * > » * * • * i • 

GAG':AC:T6AA&:,GTCAAATAAChCTGAAGGAA6TGACACAATCACACTCCCATGCAGAATAAAACAATTTAT 
7470 7480 7490 7500 7510 7520 7530 

1310 1320 1330 1340 1350 1360 1370 

AAACfVf5TH9 “JiTGGA AGT AGE£ A A'\ECAATST ATGCCCCTCCCATCAGCGGACAAATTAGATGTTCATCAAA 

i t l t i t t : i i t i i t i i i l t i t i ! i i i i i i i i i i i i i i i l i i i i i ' i l i • • l i i i t l ■ i l ■ * < < < • i i i i i i i 

i t i i i i 1 i i i : : i i t i i t t i ; i • : t i t t ; t t i t i t t i i t t i i i i i t i i i t i i i i i t i i i i > I t i i i t t i i i i 





7540 


75GO 


7570 


7580 


7590 


7600 


1380 l 3S0 1400 1410 1420 1430 1440 1450 

TATTACAGGGC'l'biCT f T PiftChftG AGftTGGTGGT AftT AACAACAATGGGTCCGAGATCTTCAGACCTGGAGG 


i i i i t i i i 


i t t t i t i i t t t i i < i i i i i i i i i i 


• rATTACAGGF. (7! 'GCTAT rAACAAGAGA*! GGTGGTAA1AACAACAATGAGTCCGAGATCTTCAGACCTGGAGG 
7610 7620 7630 '7640 7650 7660 7670 7680 

1460 1470 1480 1430 1500 1510 .1520 

AGGAGATA' fGAGC-lGACAATTLiGAGAAGT G A ATTATATAAATATAA AGTAGT AA AA ATTG AACCATT AGGAGT 

, t , ; , t i i t t • t i t i i i i t t t i i t t t i i i i t i i t i i i t i i i > i i i > i i i • i i > < * • t t i i i i i i > i i i i 

t i i t i t t i i ; ! i : t i i i : i l t t t t i t i r ... t t i i i i i t t i i i .. i i l t i i i t l i i i i l 

AGG AJ 2 ATATG AGGGAi' l A ATTQG Ai.?ftAGTGA ATTATATAAATAT AAAGT AGT AAAA ATTG AACCATTAGGAGT 
7660 7700 7710 7720 7730 7740 7750 

1530 1570 1550 1560 1570 1580 1530 

AGCACCCAC:nftAGGGAAAGAGAAGAGTGGTGCAGAG«3AAAAAAGAGCAGTGGGWATAGGAGCTTTGTTCCT 

t) flllltl lllllllllllt*' 1 • * lltlllll**** 

AGC, r, .CGCAOCAAGGC''AAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGTGGGAATAGGAGCTTTGTTCCT 
7'76C 7770 7780 7790 7800 7810 7820 

1600 5.610 1670 1630 1640 1650 1660 

TGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCACGGTCAATGACGCTGACGGTACAGGCCAGACAATT 

, : : , t t i : : i t t t i ; : i i i : i i ; i t i i i < t t i 1 i t i i t i ) i ( i i t i i i t i . i i i i l ( i i t » t i t 

, i , . i t t : t ’ i ; * • i ■ i - i * i ; i ! 1 ' * * 1 ' 1 1 1 * ‘ * 1 

TGGGTTCTrGGGAGCAGCAGGAAGCACTATGGGCGCAGCGTCAATGACGCTGACGGTACAGGCCAGACAATT 

7830 r,-;40 7850 7860 7870 7880 7890 

1670 1580 5.630 1700 1710 1720 1730 

ATTGTC.'T&VfATAGTr'.CAGOAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAACAGCATCTGTTGCAACT 

t . i , , , , i • i t i t t i t ; t t t i i t t i t t i ■ t i t i i i i i t t i .. i i i i i i i t i i i i i i » t i i i i i i * i 

i | , , ; . t ( * t i i i i : t : i i t t t f t t t i i i * i i t i t t i i i i • * t t * * t * 1 1 i 1 1 f * 1 1 1 1 ' ’ 1 1 1 ' * * ' 1 1 1 1 

ATTGTCTGGTATAGTGCAGCAGCAGAACJAATTTGCTGAGGGCTATTGAGGCGCAACAGCATCTGTTGCAACT 

7300 73 5.0 7320 7930 7940 7350 7960 

1740 1750 1760 5.770 1780 1790 1800 1810 

CACAlTrCTGGGGCATCAAGCAGCTCCAGGCAAGAATCCTGGCTGTGGAAAGATACCTAAAGGATCAACAGCT 

, , , . , , , , t , , . , , , , , , , , . , , , , , , » . i i t * ! « i i r I i i i i i * t t i ... t i i t I i I i t i I I ■ l l t l 

, , , t f t • : : t : t t i • i i f i i i ; r i i r t t t r i : i t t i t i t ) ... i , l , i i i i i i t l l i i i i i i i i i i 

CACAl vi 1*673337031 'CAAGCAGCT CCAGGCAAGAATCCTGGCTGTGGAAAGAT ACCT AAAGGATCAACAGCT 
7970 7330 7930 8000 8010 8020 8030 8040 

1820 1830 1840 1850 I860 1870 1880 

CCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTGCTGTGCCTTGGAATGCTAGTTGGAGTAA 

l i i i i i i t t * i 1 l l i v 1 l 1 t t i » t ( i t i i l i > * < • i i > > i .I i i i i i i i i t i l i 1 t l 1 t i l » l l 1 i i 

! | ( j t i t i i t t r t i t t i i t i i t t : i i i i i t t t i i t i i i i i i t i i t i t t i « i t t i t i i i t t t i » • i t t t i i i i 

CCTGGGGA !'TT GGSGTTECTCTCt AAAACTCATTTGCACCACTGCTGTGCCTTGGAATGCTAGTTGGAGTAA 
8050 8060 8070 8080 8090 8100 8110 

1390 1300 '.310 1920 1930 1940 1950 

T AAATCTC i'GG AACAGATTTGGAATA acatgacctggatggagtgggacagagaaatt aacaatt acacaag 

i m::::;:::;: : 

TAAfY rCTC f96AACAGATTTGGAATAACATGACCTGGATGGAGTGGGACAGAGAAATT AACAATT ACACAAG 
8520 8130 3140 8150 8160 8170 8180 

J 980 5.370 57/30 1990 2000 2010 2020 

CT7AATACATTOTT AATTGAAGAATCGCAAAACCAGCAAGAAAAGAATGAACAAGAATTATTGGAATTAGA 

t , | j t i * i t t t : t : i i i i ; t i t i i i i • * t t t i » t i i t i t t i i i i i i i > t t t * « < * * » ' • • i > » * 1 ' < 1 

r I I I t ( t I r ! t • f I t t l I I : * t I ! I I t t t 1 t I I I ! I I I I I l t I 1 1 t ! I I I I t t t I 1 t t I I * I I I t t t 1 I ! I 

CTTAATACACTCCTTAATTGAAGAATCGCAAAACCAGCAAGAAAAGAATGAACAAGAATTATTGGAATTAGA 
8.U>) 5700 325.0 8220 8230 8240 8250 

2030 8040 2050 2060 2070 2080 2090 

TAAATGGGCAAGTTTGTGGAATTEGTTTAACATAACAAATTGGCTGTGGTATATAAAAATATTCATAATGAT 

, , , , | ( t l t ( I I ( I t I 1 I I t l 1 ! I I I I I I I I t I I t I I t t I t 1 * I I I I I I I I I I.. I I I I I I I I I 

I I t , i i i i i t t t t i t i t t i i ; t i ! ) t i t i i i t ( i t \ i i t i » t i * t t i i i t i i » i • < * ' * * i * ' * ' 1 ' ' ' 

TAAATGGGCAAATTT6'T GGAATTGGTTGAACATAACAAATTGGCTGTGGTATATAAAATTATTCATAATGAT 
8260 8270 82SC 3230 8300 8310 8320 


2100 7110 2120 2130 2140 2150 2160 2170 

AGTAGb-Ai'-i riCTTGGT AGETTT AAl.:. ,, i AT AGTTTTTGCTGT ACTTTCT AT AGTGAAT AGAGTT AGGCAGGGAT A 



8330 


8370 


8380 


8390 


8400 


2130 2190 2200 2210 2220 2230 2240 

TTCACCATTATCGTTTCAGACCCACCTCCCAACCCCGAGGGGACCCGACAGGCCCGAAGGAATAGAAGAAGA 


i i i i t i i : i 


i i i i i i t i i t i t t i i t t i i t i i i i i i i i t t i t i i i i i t i i i 

i i i i t i i i i i t t t t i i i i i i t i i i i i i t i i i i i i i i i i i i i 


TTCMCCAJ f A, T i. :GT I C AGACCCMCCTCCC A ACCCCGAGGGG ACCCG AC AGGCCCG A AGG A AT AG AAGAAG A 
3410 8420 £430 8440 8450 8460 8470 

2250 2f!G0 2270 2280 2290 2300 2310 

AGS! ■ 3GARA6AC-1NAC AGAGACAt ATCCATTCGATTAGT6AACGGATCCTTAGCACTTATCTGGGACGATCT 


AGATGCAGAGAGAGAC: :GAriTVCArX;TCCATTCGATTAGTGAACGGATCCTTAGCACTTATCTGGGACGATCT 
£430 84>,G 8500 8510 8520 8530 8540 

2320 2330 2740 2350 2360 2370 2380 

GCGGAGCCTTGYRCCTCTTCAGCTACCACCGCTTGAGAGACTTACTCTTGATTGTAACGAGGATTGTGGAAC 

I i : I t i ? [ I i t r t i ! i I i ; 1 : i I » t t 1 : ( i > t t t » I i i i ) I ! .t i i i i i i i i i t i i i i * » » t 

1,(111:: : i r t i t ■ : t i t t : t : I t t t i i l t t t l i l l t l l < l t i i l l t i l l i i t t l ) i t l * i l i l i i l l l i 

GCGGAGO'C—7G7 GCCT CTTCAGCTACCACCGCTTGAGAGACTT ACTCTTGATTGT AACGAGGATTGTGGAAC 
8550 G560 8570 S5S0 8590 8600 8610 

2390 2400 2410 2420 2430 2440 2450 

TTC't b:GGACi--:r,AGGSKGTGt7GAAt ifXCTOAAATATTGGTGGAATCTCCTACAGTATTGGAGTCAGGAACTAA 

i ,, i t i i : I i t t i i : t 1 I I i i I i I I 1 1 I ) i ( i I t l i i i t i i I I 1 I I t t I I I I i i ■ i i i i i i i i i t i lilt 

i j j > t : i t t . : i * t i t t i » i i i i : i i t i : t * t i t i t i i t i i i i i r i t i i i t t i t t t t i * i ■ • t i i « i t t i 

TTCnisGGAlCGCKGGGir-iyTGGGAAiGCCCTCAAATATTGGTGGAATCTCXlTACAATATTGGAGTCAGGAGiCTAA 
8620 8630 SG40 8650 8660 8670 8680 


AG 

8690 
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COMMENT The; BRIO sequence differs from BH8 and BH5 by 0.9% in the coding 

regions and 1.8% in the noncoding regions, and the authors of Cl 3 
be), iave that these are stable variants. 

The HTLV-IXI genome encodes at least seven proteins* gag. pol. env. 
tetv trs. 27K antigen and the sor 23K product. The 3’ ORF 
(positions 8.153 -3773) i. a truncated in BH10 (stop codon at positions 
£522-8524), but reads through in BH8 and other sequences to yield 
..'hr.;-, it: now ceiled the 2"’K antigen. 










poq-pol -fusion protein is possible? splicing or frameshift have not 
been ruled out The viral protease would be determined by the 

region in quest.ion. 

Tf'S Tet protein (trans-activator protein, approximately 14 kb) is 
e*r. :if fector of an autostimu 1 atory pathway through interaction with 
a positive control element, the trans—activating responsive 
sequence» TAR. fat seems to be a transcriptional control molecule 
in MTLV--I. but is both that and a post-transcriptional regulatory 
molecule in HTLV-III» Deletion mutants in the tat gene are 
?.nonpetle of prolific replication and exhibit no cytopathic effects 
in F4-; cell lines. 

In addition to the ~9. 4 kb genomic mRNA. subgenomic mRNAs of 7.4. 
5,5. 5UJ: 4. 3. 2.0 and i. 8 kb have been detected. 
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papt 

1 12 
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gag pelyprotein precursor 
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pol polyprotein (NH2-terminus uncertain; AA 
1407 > 
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vif protein 


pep'c 
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vpr protein 


pBc t. 
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tat protein, exon 3 ( AA at 7735) 

exon) 

p S K L 
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300b' 

rsv protein, exon 2 (first expressed 
rev protein, exon 3 (AA at 7736) 

exon) 


5420 

5665 

vpu protein 


pep 4- 

55 SO 

8150 

envelope polyprotein 


pep t 

3132 

UJit •:> 

nef protein, exon 3 (first expressed 
premature termination) 

exon ? 

pre- msc* 

< i 

> 8932 

genomic mRNA 


p rv- -rnsg 

< 1 

> 8332 

tat. rev? nef subgenomic mRNA 


i Vlj 

GB 

5 1 35 

tat. rev. nef subgenomic mRNA intron 

i 

I VS 

340 

7733 

tat cds intron 2 


I VS: 

5404 

7733 

rev/ cds intron 2 


I VS 

540 4 

7733 

tat. rev. nef subgenomic mRNA intron 
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AAGAN-CAG-AAGACAGTGGCAATSAGAGTGAAGGAGAA-ATATCAGCACTTGTGGAGA-TGGGGGTGGA 

t [ i ; ! i i i t til ill l t i i : i l i l tilt l l l l t i l i l i 

. : • 1 I t I |< < ill Til 1 t I i t t 1 t I tilt I t t I i I lilt 

AA~:' 'GACAG6TTAAT': GATAGACT AATAGAAAG-AGCAGAAGACAGTGGCAAT-GAGAGTGAAGGAGAA 

5530 5540 5550 5560 5570 5580 5590 

70 80 20 100 110 120 

AATGGGGCAC-GY-'TGCTCCTTGGGATATTGATG-AT—CT—GT AGTGCT AC AGA AA A ATTGT-GGGT 

i tiii tt i i ■ i i till i i i i iiti it i i i i i i 

i tii: t t t - t t ) iitt till till II * i I I I i 

AT ATCAGCAC'h GTGi ; ,AGA7 GGGt.GTGG.AGATGGGGCACCATGCTCCTTGGGATGTTGATGATCTGT AGTGC 
5600 5610 5620 5630 5640 5650 5660 

130 : 40 150 160 170 180 

CACAG—TCT AT'! ATGGGGT AC-CT-GTGTGGAA-GGAAGCAA—CCACCA—CTCT ATTTTGTG 

lilt III Till t • * II II ) I I I I 1 It I till I II I II II 

i : t t tit t i I t| t t II ii i till.Ill t II I II it 

TACAGAAAAATTGTGGRTCACftGTCTATTATGGGGTACCTGTGTGGAAGGAAGCAACCACCACTCTATTTTG 
5670 5680 5630 5700 5710 5720 5730 5740 

190 200 210 220 230 240 

CATC;fti?AT5CTAAAGC:ATA'!GATA-CAGAGG-TACATA-AT—GTTTGGGCCACACATGCCTG—T 

i : ; ! I t I ! t * I I : I t t t i i t II i i i 1 i I ■ I 1 III I 

* : t t it ill t t I t i t » ( ( i II I i t i I 1 I I I ill 1 

. -"!! JC-ATUAGAT GCT A A AGCATATGATACAGAGGTACATA ATGTTTGGGC—CACA—CATGCCT 

5750 5760 5770 5780 5790 






250 260 270 280 290 300 310 

GTACCCACAGP-CCCmACCCACPAGAAGTAGTATTGGTAAATGTGACAGAAAATTTTAACATGTGGAAAA- 


GTS' i ACCC :pi. "pi -,ACCC ’.GAACC-GA-CAAGAAOTA-GT A-TTG- 


—GTAAAT—GTGACA-GAAAAT 

5840 5850 


320 230 340 350 360 370 

-ATCPCATOETPi YAACAGAT G-CATGAGGATATAATCAG-TTTATG—GGATCAAAGCCTAAAGCCATGTG-T 


i : i : i 


TTTAACAT-GYGGAA- AAATSACAT-GGTAGAA- CAGATGCATGAGG AT AT AA T C AGTTT ATGGG AT 


5 SO O 


5380 


5900 


5910 


380 2:0 400 410 420 430 440 

PAA-P —TTAi-.CC :CCACrr CTGTGTTAGTTTAA-AGTGCACTGATTTGG-GGAATGCT ACT AAT-ACCAA 

iii t t i t t i i t t ; t : t t i t l i « ill i i i i i i i i i i ii 

I ; . t : t III Jill! I 1 I t I I III II I I III III II I II 

CAAAGCCTA.-.Ai. CCA-76 rGTAAAATT AACCCCACTCTGTGTT AGTTT AAAGTGC-ACTGATTTGAAGAA 

5920 56-10 5950 5960 5970 5980 

450 4 GO 470 480 490 500 510 

TACl.AGTAATKCCAA-iAGTAGTAGCGGGGAAATGATGATGGAGAAAGGAGAGATAAAAAACTGCTCTTTCAA 

, | j | ; , [ I I I I , I t I i i ; I • i r I t i l t I l I I I t i i i i l i i i t l l l l l l l l < l l l l l ■ ■ l I l * l l 
i * i , | t t i i t : i t t r i i i i i i t i i i i i t t i i i i i i i i I i i i i t i i i i i i i i I i t i i i i i • i i i 

TGA v P:C7AP.':T.CGAPT'AGTA.GTf:GCGGSAGAATGATAATGGAGAAA6GAGAGATAAAAAACTGCTCTTTCAA 

5990 6000 601O 6020 6030 6040 6050 

■r YO 530 340 550 560 570 580 

TATC. ^GC'.A:";AA:-©ATAAGAGGTAP.G6TeCAGAAAGAATATGCATTTTTTTATAAACTTGATATAATACCAAT 

, , , i i i i i ; i i i • i t ; t i i ; • i i i • t i i i i i i i i i i t i i i t i t i i i » t i i i i i i i i i i i • • > * i * » > i • i 1 

i t : i i i : i : i i i I ; i i t j i » j ; i i i i i i i I i r r i I i i i i I I t i t i * i » i • < • * * i I i i i t i i i i i 

T,A1 CAGCACAACCAT A. AGAEGT P; AGGTGCAGA, AAGAAT ATGCATTTTTTT AT AAACTTGAT AT AAT ACCAAT 

6060 6070 6080 6090 6100 6110 6120 6130 

5 go 600 610 620 630 640 650 

AGfVrAATkPffACTACCPlGCl ATACGTTGACAAGTTGTAACACCTCAGTCATTACACAGGCCTGTCCAAAGGT 

i i • i * i t i i t i i ' r r i i t t : t i j i i i t t i i i t i i i t i i i t i i i i i i i i i i i i i i i i ' * « « i < 1 ' * < * 

, i i . : : i t ; : i ’ t t : i : i i t . ; i t . : i » i I t t i i ( l i i i l i i i t i i i i * t t * i l * i * * * * < * * » < ' ‘ < ' ' > ' 

AGATAATGATACTACCAGC'fATACGTTGACAAGTTGTAACACCTCAGTCATTACACAGGCCTGTCCAAAGGT 

6140 6150 6160 6170 6180 6190 6200 

660 670 680 630 700 710 720 730 

ATCi" T‘H til ‘*C-C 5 A A'P'CCCPTPCPTT A‘l TGTGCCCCGGCTGGTTTTGCGATTCT AAAATGT AAT AAT AAGAC 


i t i i i i i i i i I i i i i i t 


i I i i i i i i t 1 t i I 1 i I I I i 


ATCI - fTTGAGCCAATTCCCATACATTAT TGTGCCCCGGCTGGTTTTGCGATTCT AAAATGT AATAAT AAGAC 
6210 6220 6230 6240 6250 6260 6270 

740 750 760 770 780 790 800 

6TTC.:P,PTi-. ‘iP '-'iCPl: i ' ' CCAl'C'VPwPiAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTATC 


6TTC.;-,P,TP .-PTOPGGPGCATGTACAfiAT 6TCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTATC 
S'/S 5 6-290 6300 6310 6320 6330 6340 

310 820 830 840 850 860 870 

AACTCAHC'i'GCTGTTSAATGGQ-'iC-.TCTAGCAGAAGP.AGAGGTAGTAATTAGATCTGCCAATTTCACAGACAA 

j , , i i i i i • : i i i i : t i i ; t t > : i i i t i i i i : i i t i i i I i i t I I I i I I I i t I I.I I I I i I I i i i t 

l » i , t ! I I i t : t * * i i I l < r l i t I ’ l l ! I I t I I I l l I t t I I I I I I I I I I I I I I I I I l I I I I I I t I I I l I 

AACTI :ap,C 1 riT .iTi 'l Pi' 1A1 GGCAc TCTGGCAGAAGAAGAGGTAGTAATTAGATCTGCCAATTTCACAGACAA 

6350 67.60 6370 6380 6390 6400 6410 

20 SCO -iOO 910 920 930 940 

TGCT AP. AA'SC.'-YT Afi T AGTACAOTTGAACCAATCTGTAGAAATTA ATTGTACAAGACCCAACAACAAT ACAAG 


T13CT AAAACCA1 P.P:''PJ-iTP.CP.GC r 6AP.CCAATCTGTAGAAATTAATTGTACAAGACCCAACAACAATACAAG 
6420 5-130 S440 6450 6460 6470 6480 6490 


SCO 760 370 980 930 1000 1010 

APP.4AGTP.TCCf- ff v rf ;i :A5Al3SeGACCP.EGGAGAGCATTTGTTACAATAG6AAAAATAGGAAATATGAGACA 


i t i i t i i t i i i ■ i i i t i t i i i t i i i 


AAP.-.AGTP I'C 7.1:5 .CAGPGPs ! -\ CC P.G GG PC AGC : ATTTGTT AC A AT AGG A A A A AT AGG A A AT ATG AG AC A 
b.T. r; 6510 6720 6530 6540 6550 6560 











1020 , 0 -V, 1Q4C 1050 1060 1070 1080 1090 

fipn'■>(m W TPHW: ASTAGAG:';AAAATGCAATECC^CTTTAAAACAGATAGCTAGCAAATTAAGAGAACA 

;i I::;; i i::::; : : : : ::::::::::::::::: :::::::::::::::::: 

AGVCCA' r T nTAACAT 'I AGTAGAGCAAAATGGAAT AACACTTTAAAACAGAT agatagcaaatt aagagaaca 
('570 CCCO G5 S0 SS00 6610 6620 6630 

i :\ o 1 1 1 o 1120 1130 1140 1 150 1 160 

ATT '; i .L-'YT'rrr-v^■TAnAAPCAAYrPATCTTTAAGCAATCCTCAGGAGGGGACCCAGAAATTGTAACGCACAGTTT 

. . . . : . , , , , , t t I t ) i : 1 i t i • : 1 ( I ) I > > I > t i i I I » i * i « t i i t i .. l i i I I » i I I i i t I i i 

t , • i t ; I t t r r i i t i i t t l l i < t 1 i i t t t i i i t » t t i i i l • I t l l l t t l * i t I * • * * l * * I * * I I 1 1 * * 1 * 

ATi i hTsAAi-YTAP. f AA'-.ACAAYrPATCTTTAAGCAGTCCTCAGGAGGGGACCCAGAAATTGTAACGCACAGTTT 

’ tit ■■ ic PECO bt 60 6670 6630 6690 6700 

i 5 7*0 jino 1 ISO 1200 1210 1220 1230 

TAft-. nlTM: ■’iTTTCPP 1 YfTTT TCTOCT GTAATTCAACACAACTGTTTPATAGTACTTGGTTTAATAGTACTTG 
!!!:!!!!!! V 7 I !!!!!!!!!!!!!!!!!!!!!!!!!!!! I !!!!!!!!! ! 

TA^CT VGTGNPGGGGAi-'T'iT'l TCT AiCTGT AATTCAACAC AACTGTTTAAT AGTACTTGGTTTAATAGT ACTTG 
G75.0 C720 G730 6740 6750 6760 6770 

T / ! O 1250 120.0 1270 1280 1230 1300 

GflOf; nCTGAAGGi : ,iT3Pf , iATP.Ar:AGTGAAGGAP.GTGP.CACAATCACACTCCCATGCAGAATAAAACAATTTAT 


6PT-1 ACTAAA T' T 7 TCP.AAT AACACTSAAGGAAGTGACACAATCACCCTCCCATGCAGAATAAAACAAATT AT 
B7SC; B7S0 6 SOO 6310 6820 6830 6840 6850 

l.-j-.O 1320 1330 1340 1350 1360 1370 

AA*V :ATGT •. Tl rCP.GGA.OGTAiLiGPAPAGCAATGTATGCCCCTCCCATCAGCGGACAAATT AGATGTTCATC AAA 

. . t , , , , ( t • ( T t f ( t I I I 1 t t I I t t I t I I * * I * I t 1 I * • * • • * * 1 I I t I I I I * t I f I I I l 1 I I I • * * * 

, , , . j . , , ; . 1 , : , • ! f r 1 t f 1 ! t t 1 1 I t t t i t I I I l t I 1 * I I t t * 1 I t 1 I I I I 1 > I » I 1 t • » • t * • * * 1 

AP.ftC:'-;TGTt'l.7-:P('6P..-M-n-ASGAAAAGCAATGTATGCCCCTCCCATCAGTGGACAAATTAGATGTTCATCAAA 

-270 6580 6830 6300 6310 6320 


1330 ’530 1400 1410 1420 1430 1440 1450 

TAT"* TTCAGiQGCT TCT ATT AAOAAC AGATGGTGGT APT AACAACAATGGGTCCGAGATCTTCAGACCTGGAGG 
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i i i i i : t i t • t i t t i r i i t t i : t » i i i i i i i i i i i j i i i t t t i i i i i i i t i t t i i i i i * • i i t i i i l t i l 
l i i i i i i i i i ; i t ■ r i t i ; i t i * t t i x t i • i t t i t t i i 1 t » I t i • I t i I t i i i t I I I » t t I I t i i i i » i i 

CATYATCGTTTCAGACCCACCTCCCAAfCCCGAGGGGACCCGACAGGCCCGAAGGAATAGAAGAAGAAGGTG 
2620 26.30 0640 2650 2660 2670 2680 


2250 2250 2270 2280 2290 2300 2310 

GAGAGAGAGAC AG AG Ai .-A5 A rCCATTCGATTAGTGAACGGATCCTTAGCACTTATCTGGGACGATCTGCGGA 

i i i i t i i ; i i : j ; : i : i t i : t ; : ; i t t j i i i : i i t t i t . . t i i i i t i i i l x x i l i i i l i i x x l l 

i i i i i % x l I i i t t ; i i t t t t ' i t t i t t t X I i t i i i t i i i t t I i X I i t I « t i 1 I i i t X I I x t i l * i l t t i i l l 

GAGAGAGAGACAGAL5ACAGAYCCATTCGATTAGTGAACGGATCC.TTAGCACTTATCTGGGACGATCTGCGGA 
2690 2700 271.0 2720 2730 2740 2750 2760 


2320 2330 2345- 2350 2360 2370 2380 2390 

GCCTT6TGCCTC;TTC:AGCTAC:C'AC:CGC7 TGAGAGACTTACTCTTGATTGTAACGAGGATTGTGGAACTTCTG 

ill I t I t t : I I t I 1 t T t t 1 t : t I t t t t I t I t I t 1 t t I t I I 1 I ■ I I I I t t I I t I I I I t I I I I I I I I I t I f I I 
III i i i i i t t i i • t i • i i . t i t t i t i i i i t I i t t i i i i I i I I I I I I I I t I I I I I l I . till 

GCC—TGTGCCTCTTCAi'CTA'.l, ICCGCTTGAGAGACTTACTCTTGATTGTAACGAGGATTGTGGAACTTCTG 
2770 2780 2790 2800 2810 2820 2830 


2400 2410 2420 2430 2440 2450 X 

GGACGCAGGGGGTGGGAAGCCCrCAAATATTGGTGGAATCTCCTACAGTATTGGAGTCAGGAACTAAAG 

i i i i t t i i < i i i t i t i t I [ t : t i i i i i t t i i < t i i t t i i i i i i i ■ i i i i i l i t t * i i i i i t i i i i i i 
l i i i t i i i i i i i i r t t t t t i t t t i i i i i t i i t i i i t i i i i i i i i » i i ■ i t i i i i i ■ i i i i i x i i i i t 

GGACGCAGGGGGTGGGAAGCCC 'T 'AAATA7TGGTGGAATCTCCTACAATATTGGAGTCAGGAGCTAAAG 
2840 2850 2860 2870 2880 2890 2900 


7. KUNZ—158-CL32. SEQ 

HIVNL43 Human immunode-f iciency virus type 5 NY5/BRU (LAV- 
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HIVML43 6709 bp SS -RNA- VRL 15-JUN-1989 

Human immunnda-i iciency virus type 1* NY5/BRU (LAV-1) recombinant 
clone pNL4~3„ 

Ml 9921. 

Human i mmurr iciency virus type 1 <HIV-1 > » NY5/BRU (LAV-1) 

v ecciTib i nant cjine pNL4—3. 

Human • mmunodo-i iciency virus type 1 

V:l r T N". enveloped viruses? Retroviridae* 

Lentivirina:.;*. 
i O-.0-..22 ' 70 3709) 

7i . r, n:i. .lor—White,A. J. * Willey*R. L. and McCoy* J. 
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RERZf' 


AUTHURb 


JOUFiNftL 
STANDARD 
REFERENCf- 
AUTHHF'S 
JOURNAi. 

standard 

COMMENT 


r, ‘-iTOi'A 

iridelmon.H, E. * Koenig»S. * FolkSiT. * Wi 1 ley «R. # 

bunt i n, M„ A. 

~u qu 1 red j mmunade f- i c i sncy syndrome-assoc i ated 
. hume-n and norihuman cells transfected with an 

,i ’ c 'ci j j sr cl ora 
AUA-DS) < 1.986) 

•• -■ f?v< 

; .v; sions of l 13 > 


sequence IF 13 kindly provided by Chuck Buckler. NIAID. 
'•'4-JUH-- The construct i on of pNL4—3 has been 

i uWL.4-3 is a recombinant (infectious) proviral 
:a'nc DMA from HIV isolates NY5 (S’ half) and BRU (3’ 
+. * of v:-*c ruT.i-. :.viat i( >n is the EcoRI site at positions 


FEA'i 


•--'squares ef the wpr coding region corresponds to 
!v sc:, SF 2 s MAL. arid ELI isolates. The vpr coding 
cj isolates is about 18 amino acid residues longer 
no! region of the Illb isolates. In HIVNL43. this 
:.rj k sing be.se deletion (with respect to the mb’s) 
•■/o. Tins sequence at this position is "atttc" in 
•\f t. i;t(.' r in HIVHXB2. 

v :! clone, sequenced by Wain—Hobson, et al. (Cell 40. 
,-nd the BRU portion of the pNL4-3 recombinant clone 
c• ones from f.he seme BRU isolate. 

,j,n reported in the FEATURES produced changes in 
; nonces The revision at position 2421 changes one 
'•■due ,-vom ’R" to ’G 5 in the pol coding region. The 
ioi-c o-OU— jOGO changes three amino acid residues 
• WP • :m the r.ef coding region. 

- an risscr i pt i on 

••'j.'! gt.;-; polyprotein 

pul polyprotein (NH2-terminus uncertain. AA at 

20. 'r) 

?i.; vi f protein 

vci 1 protein 

;. tn r protein, exon 2 (first expressed exon) 

v/ • cal' protein) exon 3 (AA at 8370) 

; 'rev protein, exon 2 (first expressed exon) 

rev oroteln. exon 3 (AA at 8371) 


a nve1ope po1 yprots in 

nr 1- pv ote i n 
pinn sTi i c mRNA 

tatj rev, nsf subgenomic mRNA 

tat- rev. nef mRNA intron 1 

tri- cds intron 2 

rev cds intron 2 

tat-j rovi nef mRNA intron 2 

5 5 LTR 

3 ’ LTR 

R - i speat 5 ’ copy 

W repeat 3 s copy 

£p1 b i nding site III 

Sp 1 b i rid i ng s i te 11 

Sp 1 b J. nd i ng sits I 

fni-n (Lys-tRNA) binding site 

Era-RI 3 i. te of recomb i na t i on 

i-iiv- i isolate NY5 DNA end/HIV-1 isolate LAV 

DMA start 

at \n [311 tg in Cl] 
r a 131? c in C13 


r i i 



BAS. . 
ORIi : 


tcr'wVf.: in r33 > ctcaca in Cl] 
c ;t n l' 3 j 5 a in Cl 'J 
it'iRi-i - -. f>n ).yaden 1 yation signal 
23:1.6 e: 2166 t 


Init 2 - 

Resi 

Gaps 


72 . C-m-ad scdi s “ 2181 Significance - 0.00 

. :•j ; i i - :r , «* 2238 Mismatches = 187 

'ii T^ar./ei i vs Substitutions » 0 

77 30 40 50 60 70 

CO. V1 1 : 3 T ixf AGGAGAAATATCAGCA-CTTGTGGAGATGGGGGTGGAAATGG 

T ■ . . 1 t ! : i t i lit i 1111 ill i 111 i 1 

... i : iiiti t iii i » i t i t i i i iii i i 

"A A { i 5:oTi iSACO' AA'I AG a a AG AGC AG AAGAC AGTGGC AATG AG AGTG AAGG AGA 
CO G200 6210 6220 6230 


! GAY -Gh TCTG PAG- 


110 120 130 

-TGCT AC—AGAAAAATTGTGGGTC—ACAGT 


i_. "i s i 'iA(v. ■ iGbiGGG f Gb: A A A' I'GGlaGC AUC ATGCTCCT TGGGAT ATTGATGATCTGTAGT 
•' ; 6270 6280 6290 6300 


:0 

..AT A'i 


-;0 ISO 170 180 190 

: Pi G.7G PRG. A' IRGAAGCAACCACCACTCTATTTTG-TGCATCAGATGCTA 

. IT! It I III II I lilt II 

; 1 || ! lit t I I till II 

-Of i -f; P ATI'ATGRGG-T ACCTGTGTGGAAGGAAGCAACCACCACTC 

0 BT-'IO 6350 6360 6370 

720 230 240 250 

-I A.-:! GT Y - -TGGGCCACA -CATGCCTGTGTACCCACAGACC—CCA 

t . i : i r . t i t t t i f i ii i i i i i i it 

i , , t : [ | I i III ill II I I till II 

7 AY- -• PGA fACAGAGGTACATAATGTTTG—GGCCACACATGCCTGTGT 
: to 8410 6420 6430 6440 

230 300 310 320 

, tTfG'PR ACAbAAAATT'i TAACATGTG-GAAAA—ATGACATGGTAGA 

, . ; it i t III i i i i l i l i i i l liii il ii 

, , , tt j t III i i i i i i t t l i l l t i l It tl 

iRAAGY -Alii—'fATTGGTAA—ATGTGACAGAAAATTTTAACAT—GTGGA 

5,170 6480 6490 6500 


. 550 360 370 380 390 

PATf'.'TUtA 1 - —Tl 7.-.Tur-GGATCAAAGCCTAAAGCCATGTG—TAAAA—TTAACCCC 

; . ■ : . i til lilt tl I I ill t I III ill tl 

; - || ! | i lit till t I I I lit I I III III II 

PA6 AT •CA: :AT!i,CAT6A6GATATAA-TCAGTTTATGGGATCAAAGCCTAAAGCC 

-.CTO f .540 6550 6560 6570 


4 j. 0 420 430 440 450 

PA' v-AG rGCACTRATTTGG-GGAATGCT ACT A AT-ACCAATACTAGTAATACCA 

■ ; | lift III | I I I I I I I I I III I I I I I I I t I I 

l i i t t i t ; I I ill III t I I III II l l l I I I I 1 

i A -'iCCCT -i'C I tTT GT05TT AGTTT AAAGTGC—ACTGATTTGAAGAATGAT ACT AAT ACC A 
: :bi.O 6600 6610 6620 6630 6640 


; 430 460 500 510 520 

’3 ‘i.-Vl C •. ,-QPi TRGARAAAGGAGAGATAAAAAACTGCTCTTTCAATATCAGCACAAGNA 

. ; j , ■ , | | ; I ! t t | I t I I I I I I I I I I 1 l I I I l 1 1 I I ' ' * ' ' I '• 1 1 ' 

i i j t i i i t t t r i t i i i i i i i t I i < i i • i * i * * * * * * • 1 1 * 1 1 * 1 * 1 * 1 1 

-AlCCAm-*. .'AA, i'GirA'TAAAGGAGAGATAAAAAACTGCTCTTTCAATATCAGCACAAGCA 




73 CIO 
OC3 CJ\J 


6690 


6700 


6710 


560 570 580 590 600 

ATTTTTTTATAAACTTGATAT AATACCAATAGATAATGATACTA 


i : t t i i I i i i i > > > 


f -/^v tr;;v ^ v , Tec:ATTCTTTTATAPl( o iCTTG p jTAT p i GTACCAATAGATAA- 

6750 6760 6770 


S40 


650 


660 


670 







\'-,'Tr !* 1 AOTf CA* Tf CA‘ !‘TACACAGGCCTGTCCAAAGGTATCCTTTGAGCCAA 

oO - ' RO10 38.20 6830 6840 6850 


•:vvrp.- Vi 


■o 700 710 720 730 740 

bOC' -Ob 1 JTbG n TTLSC&ATT CTAAAATGT A AT A AT AAGACGTTCAATGG AAC AG 


f-4 :1Q 


O'rbiGTTTTGCGATTCTAAAATGTAATAATAAGACGTTCAATGGAACAG 
2850 6830 6900 6910 6920 


xl'iTiT 


-,o 770 730 790 800 810 

1 C AGO: v;«luTACATYTST ACACATGGAATTAGGCCAGTAGTATCAACTCAACTGCTGT 


i i i i i i » i i i i i i ( i i i i i i i i i 


atgtoiv: 


k"! t w.uOi s*— ! * • 




: ''rAGO’O.Or.GTACAATGTACACATGGAATCAGGCCAGTAGTATCAACTCAACTGCTGT 
6550 3960 6970 6980 6990 

140 850 860 870 880 

^CiOGGAlb-lYXiASGTfiGTAATTAGATCTGCCAATTTC^CWQACAATGCrrAAAACC^VTAA 


: •:;' VuvPiO.1 ATG'i 'ASTAATT AGAT CTGCCAATTTCACAGACAATGCT AAAACCATAA 

•uoo 7030 7040 7050 7060 


OtJPlK. ■' 


i.l'JOH'l O; ) 1 


YiOf! '-A 


220 930 940 950 960 

CAOTC1 ,iVYG'-iAAT fAATTGTACAA6ACCCAACAACAATACAAGAAAAAGTATCCGTA 

! , I I I ; : • ; I 1 I 1 I I I ■ t t ( I I t 1 • 1 I I 1 t t I I I I t I I t I ■ t I I t < t 1 I 1 I I ( I I I 

: , ; , i , , - , r ( I : i i i i t t i r i i i t t ( ( i I 1 i i » t i I i I i I t I l l t I I I i t I I I 1 

4CATC1 i! AGOAAT TAATTGTACAAGACCCAACAACAATACAAGAAAAAGTATCCGTA 

V y. 30 7100 7110 7120 7130 

--iSO 630 1000 1010 1020 1030 

,71:: ,-AC-;A;.-/:i ATT f"£?TTACAATAGGAAAAATAGGAAATATGAGACAAGCACATTGTAACA 

• t ; • j : i ■ i t t ’ 1 1 i t l l i I l t t l l l t l i I i l i l t l i i t l I i l l I .It 

, i t i i r ! i i : i i i i t t i i t t i i i t t i i i i t i t : i i i i i i i i i • * t i i i i i i i t » t t i 

: 17G AEAGCATTT(T.1T ACAAT ARGAAAAAT AGGAAAT ATGAGACAAGCACATTGTAACA 

7130 7170 7180 7190 7200 7210 

, 0:30 1. OSO 1070 1080 1090 1100 

i •; ;,i *A 1;;; rTT7.Ar-y-V3ASATAGCTAGCAAATTAAGAGAACAATTTGGAAATAATA 

! . . , , ; . , j < , i i j i i t i i i i i i i : i i i ( i i i i i i i i i * i * i i ' • ' * i * ' ' 1 1 * 1 

,11 b i 1 v; T T i ftftAACAEATAGCTAGCAAATTAAGAGAACAATTTGGAAATAATA 

7.130 7240 7250 7260 7270 7280 

1130 1140 1150 1160 1170 

•ib.3i-.il -43 i Cl:•iV:A9uA6:GGiaACa^GAAATTGTAACGCACAGTTTTAATTGTGGAGGGG 


i i i i i i i i i l l i t i i t i i i 


. ,;-50A3 '7:!-i.AGG hivGGGACCCASAAATTGTAACGCACAGTTTTAATTGTGGAGGGG 
V--.J 0 7320 7330 7340 7350 

> 'rOO 1210 1220 1230 1240 

'AAV rCAr-VOAC.-A.O'i GT*f' fAATAGT AC'iTGGTTT A AT AGTACTTGGAGTACTGAAGGH3T 


[*A A'! TC A.AG ACA ACTRTTTAAT AGT ACTTGGTTTA AT AGTACTTGGA6TACTGAAGGGT 
, j ,4oU 7590 7400 7410 7420 

127 0 1.380 1230 1300 1310 1320 

. ,i.'-CAOAA1 CACACTCCCATGCA(^ATAAAACAATTTATAAACATGTGGCAGG 

. , , , , , , . , , , : . t ( I | , 1 I | t t I I I t I I I I I I I I I I * 1 1 1 1 > 1 * • • > 1 1 > I > 

t ! I 1 1 l • I 1 1 I l t I t I I I I I I I I t I I t t I I I I l I.I I I I I I I.. 

:i.-yo ) :jiiV.l'-lCAA'i’CACAilTCCCATGCAGAATAAAACAATTTATAAACATGTGGCAGG 
74 50 7460 7470 7480 7490 


7500 




.'40 1350 1360 1370 1380 1390 

rt-Y !"'; PCCCATCAGCGRACAAATTAGATGTTCATCAAATATTACAGGGCTGC 


i i i i i i i i t * i 


•i:: rra iCATC: -iG TGSACAAATTAGAT6TTCATCAAATATTACTGGGCTGC 

4250 7540 7550 7560 7570 


1430 1440 

-'-^Tr:nROi-fliciTm r 


1450 


1460 













vjt- i uroiui 


•.''.-.T^r:Cf-ACP^Ti^GTCCC?fti3;ATCTTCft6ACCTQGAGG5AGGCGATATGAGGG 


73; 




; fc.OO 


7610 


7620 


7630 


7640 


I \’?r, MOO 1.300 1510 1520 1530 

■ V-'V'.' ■' . i biJ.7;. ^ h v i t~ v, i i ft ; f : -■ \ f'-Vj fG i AAftln 1 AG! AAAAATTGAPiCCATTAGGAGTAGCAL-CCACCAAGG 

t f t r t * j • i . i . • : r f ; i ! i t ! i i i i t i i i i i i i i i i t i i i i i i i i l i i i i l r t t 

i t t 1 i ! ' t t I I I I 1 [ I t t | | | | 1 I I I I I I I 1 I I ■ I I I f I t I I I I I 1 • I I I I t I I 

’.■•V r yi.V.:7^.. : 7 . 1 ■ A : -\i-V r A f 2AASTAGTAAAAATTGAACCATTAGK3AGTAGCACCCACCAAGG 

77.fi* > •,.':.70 7600 7690 7700 7710 


ir : - ; c •. •:' c .1570 isso isao 1600 

I 7 - : •': 1 1: ' '• V?£-& Ml! 1AAAAA6A6CABTGGGAATAGGAGCTTTGTTCCTTQGGTTCTTQGGAQ 

* t ' : • : • ; • • : 1 i t I * I ! 1 - - ) : t f 1 1 1 : t t t t t t I I 1 I I 1 I I I 1 * t » t 1 1 * I 1 I I t t * I I I I I I I t 
: t • * > -1 ■ I ■ . • ' : . : ; : : t ; l l t 1 t ) : l t t 1 1 f l I 1 1 1 1 l t t 1 1 t 1 1 t 1 1 l 1 l t l 1 t I I I 1 1 l ) 

U- 7 V" : v ; -77777 1: ~AtN'V?.^6rtAAAAA6Al5CA6TCKSAATAGGAGCTTTGTTCCTTGGQTTCTTGiGK3AQ 

773’. Ti-y :■ "t"Y 7750 7760 7770 7780 

1610 '520 1030 1640 1650 1660 1670 

•;•,374.7 .ro-v. oosc 7-Of:iCQ6T;'AATTiAlOGCTGACGGTACAGGCCAGACAATTATTGTCTGGTATA 

i i t • * ..ft;*' • r : • t i t t i i i i i : ; i t i t I » i t i t i i » I i i i I r t i i i I I I i i i till 


i : ■ t : i t r t * i t i t t it*: t i i t ... l l < I I t I t t t I I I.. I t i I t t t i I t I I 

■; -v:-,:-rv•■• 4 7 : pgc -'ccao-gtcaatracgctgacggtacaggccagacaattattgtctgatata 

7700 70,' 70 5 0 7820 7830 7840 7850 

168<'. 1.500 1700 1710 1720 1730 1740 1750 

i • i • ,1 Y'iiVG7i,.:7 ,':.:4iAr.AA7T-|U:r.TbiAlTGGCTATTGAGGCGCAACAGCATCTGTTGCAACTCACAGTCTGGGGC 

I I ! I : r I 1 I t ] I - i : ! I ’ ) t ! t t t i I ! i i i i i i I I I I • I * 1 * I 1 t I I I I t i i ... i i i i i I 

! * i t . : i l j ; t : : : : : > • - . ) ! i : t i i t t t i i i i i i i 1 i I t i I I i t t i i » i t t 1 I i i » I.. I 

1 i ■■ j.'vi,:;.-'. ; v. •. nroci oa&ssctattgaggcgcaacagcatctgttgcaactcacagtctggggc 

7870 -ui‘ O 786*0 7890 7900 7910 7920 7930 


■7S. 1770 1780 1790 1800 1810 1820 

.‘Vi ‘vf » A6CAH'. T AAS* if-VT CCTGf?CT GTGG AAAGAT ACCT AAAGGATCAACAGCTCCTGGGGATTTGG 


t i t t 


i'VtrjAnACA'.V’TiOt'.Y.'.QGC'AAiaMi-.TCCTGSCTGTKGAAAGATACCTAAAGGATCAACAGCTCCTGGGGATTTGG 
7940 7950 7960 7970 7980 7990 8000 


• 830 

!-?;7 '"4-.iCT6 177.. 

I t 1 l 1 I ! f 

GlGiTGCTC f GG:i 
G010 

19-00 

CAGfVT rTGGAA' 

i : ; * i i i * 

i * i : i i j t t ■ 

Uf 7 : 7 VTTu'..:: 7 Y 

7030 


1840 1050 1860 1870 1880 1890 

, 194,C: fCA’I iTGCAGCACTGC rGTGCCTTGGAATGCTAGTTGGAGTAATAAATCTCTGGAA 

i * t : ■ : i : t : t : t ; : : f i t i i t : i i i i i i i t t t i t i i t t i i i i i i i t t t i i i i i i t i i l • 

i • t t i i i t r r i t i i i > i t i i i t » i i i i i t i : » i i i i i t t ■ i i i i * i i i t » i i i • i » i i i 

:AAAt r! CAT • 7 GCY'CCACTGCTGTGCCTTGGAATGCT AGTTGGAGT AAT AAATCTCTGGAA 
802.0 7030 8040 8050 8060 8070 

.730 1330 1940 1950 I960 

•AAavr6AC-CTe&.TVrGG6GTGGKACAG(=tf3AAATTAACAATTACACAAGCTTAATACATTCC 

: ‘ : : . : [ t i i i t i c t ; i i l i i l i i i l t i < t i t i l i l i i t t t t ■ i t i i l < l l t i < l i til 

i ■ ) s i t : t • t t i i i t i i I I I i ( i i I i t l i I I l I i i I I t I l i i i i l i i t i i i III 

! VIC A T<3 AC;: TGC, f Gft AGTGEaG AC AG AG AA ATT A AC AATT ACACA AGCTT AAT AC ACTCC 
TOGO 3100 3110 8120 8130 8140 


1970 ICO'.; 1990 2000 2010 2020 2030 

TTAA fGAAE-AATCL-.‘C:i : iAAACCA6CAAGAAAAGAATGAACAAGAATTATTGGAATTAGATAAATGGGCAAGT 

• t : • t ! t l t i i ! i ‘ i i ; t i t t i i ; i t i i : t t i i i t t i i t i i t t l f t i t i i i i i i i i f i t i i t l i i t i i i i i i 

t r • ( ■ t t i i : : ■ i : i t t t i t ■ t i ! i : t i t t t i t i i i i i I i I i I i t i I t t i i t t I I I ■ i I t i > I i I i t I t I 

TTAs ! i GAA:, AA VC :»' 'lAAACCAGC AA6AAAAGAATGAACAAGAATT ATTGGAATT AGAT AAATGGGCAAGT 

0150 81 GO 8170 8130 3190 8200 8210 


2040 2050 2060 2070 2080 2090 2100 2110 

' f'OVn T7 GAh‘ f>* 64-" f TTA/ VCAT A ACA AATTGGCTGTG83TATATAAAAATATTCATAATGATAGTAGGAGGCTTG 

■ i ; t : ; i t i i i t i i i i • i i i i i i ; : i : i i i : t i i i i i i i t i i t i i i i i i i i i i i i i t I i i t i i i i i 

i i i t i i t i t ; i t i : i t i i i i t t j i .. i t i i i i i i i » i i t i i i i i ■ i i i i i i i i i i i i i i i i i 

rn: r i gcaa gt; go ttaacai aacaaattggctgtggtatataaaattattcataatgatagtaggaggcttg 

8220 .;>.2u 3240 8250 8260 8270 3280 8290 


2120 2130 2140 2150 2160 2170 2180 

5 -'{'.I .G PH 'i 7"A'-H 77T • 'SCTi viTACTTTCTATAGTGAATASAGTTAGGCAGGGATATTCACCATTATCG 

1 ; i i i i : ) i l t i i i : : * i i t t J t i t t t i t i i i i t i i i i i l i l i i 1 i i i t i i i i i i l i l l • t i t l » i l i 
: i i i i i i * t r i t i i t i i ’ t t t i t i i i i i i f t i i i t i i t i t i i * i t t i i i i i i i i i i i i i i i • t t t i i t i i 

I t;-\ j .TTTi7':7A.' vrP.aTTT’i' TGCTfSTACTTTCTATAGTGAATAGAGTTAGGCAGGGATATTCACCATTATCG 
03(70 73:10 8320 8330 8340 8350 8360 


90 




7210 2220 2230 2240 2250 

' l -.''fy.,T; l:fl rrr,iarQnr;rrrnflOBr: Q flTflTOQf;flOCQaB^TO fl c fl cflf: a 





















I I ! I I ' 1 I I I < I I I I I 1 I I t t I 

I I I I 1 I t t t 1 1 I I I f t I I I 1 t I 


Ttrf ;qL'5AD'\’C4,CGvaj_.L';;ifyrijx.'CGA0l:iiGGACCCGACAGGCCCGAAGGAATAGAAGAAGAAGGTGGAGAGAGA 
8370 ULRO 8390 8400 8410 8420 8430 


20'.O L'.iTO 2280 2230 2300 2310 2320 

;yLACACAFS fCFATTCFATTAGTGiftACGGATCCTTAGCACTTATCTGGGACGATCTGCGGAGCCTTGTG 


1 t I 1 I t 1 t t t t t t l T t : ! 1 1 t I t I I t t I I « I ■ ■ * • 1 1 1 t t ( 1 I ' I • I 1 » * 1 * * t t I 


qq iGAGACASA fCCA TTCGATTAGTGAACGGATCCTTAGCACTTATCTGGGACGATCTGCGGAGCC—TGTG 
8440 ■ i430 8460 3470 8480 8490 8500 


2330 23iO 7330 2360 2370 2380 2390 

am:TTCA0C7ACCACCGCT ,*GAiiAGACTTACTCTT6ATTGTAACGAGGATTGTGGAACTTCTGGGACGCAG 
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CCTOTTCAitFTACCOCCGCT rHr'G ASAC - iTAC‘ fCTTGATTGTAACGAGGA77GTGGAAC7TCTGGGACGCAG 
85 :•«; 6320 3530 8340 8550 8560 8570 


6£SFTGGGAf<-7X;CTGAAATATTeGT6GAATCTCCTACAGTATTGGA6TCAGGAACTAAAG 
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Sequence for [31 kindly provided in computer-readable form by 
L« Rat*ner 9 IS—AUG—I38G. 

Ti-.e HXB2 sequence io being used as a reference genome for all the 
Miv entries because it has been derived from a demonstrably 
infectious clone. Hence not all of the “sites" references above 
were concerned with this isolate- 


FEATURES 
pep t 
pept 


hirding 
binding 
binding 
binding 
revision 
s i cfna 1 
BASE COUNT 
ORIGIN 


r’fU n 

789 

2357 


to/span 
2291 
9095 


descrJ.pt ion 
gag polyprotein 

pol polyprotein (NH2-terminus uncertain; AA at 
2357) 


pent 

3040 

5618 

sor 23K protein 

pept 

5558 

5794 

R (ORF) protein 

pep t 

5830 

6044 

tat protein? exon 2 (first expressed 


8378 

8423 

tat protein? exon 3 

pept 

5SS9 

6044 

trs protein? exon 2 (first expressed 


0378 

3652 

trs protein? exon 3 

pept 

E224 

8794 

enve1ope po1yprotein 

pept 

3798 

9167 

27K protein (premature termination) 

•mRNA 

455 

9635 

HXB2 genomic mRNA 

pro—msg 

455 

3635 

tat? trs? 27K subgenomic mRNA 

I VS 

*3045 

8377 

tat intron 1 

I vs 

(5045 

8377 

trs intron 2 

I vs 

8045 

8377 

27K mRNA intron 2 

TVS 

743 

5776 

tat?trs? 27K mRNA intron 1 

I vs 

G045 

8377 

tat? trs intron 2 

LTR 

i 

634 

5’ LTR 

LTR 

3085 

9718 

3’ LTR 

rpt 

4 54 

551 

R repeat 5’ copy 

rpt 

9533 

9635 

R repeat 3’ copy 


636 
561 .1 
961 1 
3411 a 


336 Sp1 binding site III 

397 Spl binding site II 

408 Spl binding site I 

653 primer <Lys--tRNA) binding site 

561). g in C323 5 a in [43 

9616 HXJ32 mRNA polyadenyation signal 

1773 c 2370 g 2164 t 


435 bp upstream of PvuII site; S’ end of proviral genome. 


Initial Score 
Res i due I dent, i ty 
Gaps 


1858 Optimized Score = 2174 Significance = 0.00 

88% Matches = 2243 Mismatches = 169 

110 Conservative Substitutions = 0 


X 10 20 30 40 50 60 

AAGAG--CAG-AAGACAGTGGCAAT6AGAGTGAA6GAGAA-ATATCAGCACTTGTGGAGA-TGGGGGTGGA 

; : : : ; : :;; ::::::::: ::: : :::::: : : : : 

AATAGACAGGTTAATTGATAGACTAATAGAAAG-AGCAGAAGACAGTGGCAAT-GAGAGTGAAGGAGAA 

X &l 80 6190 6200 6210 6220 6230 6240 

70 30 SO lOO llO 120 

AATGGGIBCAC—CATGGTCCTTGiGGATATTGATG-AT—CT—GTAGTGCTACAGAAAAATTGT—GGGT 

: : ; ; : : ; : : : : : : : : : : : :: : : : : : : : : : : : : 
ATATCAGCACTTeTGGAGATGGSEGTGGAGATGGGGCACCATGCTCCTTGGGATGTTGATGATCTGTAGTGC 
6250 6260 6270 6280 6290 6300 6310 


130 140 150 

CACAG— TC f ATT ' ATGGGGTAC CT GTGTGG AA- 


160 170 180 

-GGAAGCAA-CCACCA-CTCTATTTTGTG 


TACAGAAAAATTG.!GGGTUACAS I'CTATTAIGGGGIACCTGTGTGGAAGGAAGCAACCACCACTCTATTTTG 

r,-^c 3 rj r, 3 /i.n 



1 . 30 200 210 220 230 240 

CATCAEA1 GCTAAAGCATA1 GATA-CAGAGS—TACATA-AT—STTTGGQCCACACATGCCTQ—T 

i : i > t » t t t i i t i i i t i i t i ii i i i i i i t i i lit i 

lit t ii ill it ii i • iiii i ii ii ii i till i ii i 

-TGC-ATCA6ATGCTAAA6CATATGATACAGAGGTACATAATGTTTGGGC-CACA—CATGCCT 

8390 9400 6410 6420 6430 6440 

250 260 270 2S0 290 300 310 

GTACCCACA&A-CCCCAACCCACAAGAAGTAGTATTGGTAAATGTGACAGAAAATTTTAACATGTGGAAAA- 

it 1 c r I t t I ii it ill i I I i til II i iiii i til i i i t 1 

it i it i tit i ii ii tit iiii ill t i i iiii i tit i i i i t 

GT61 ACCCACAGAGCCCAACC-CA-CAAGAAGTA-GTA-TTG-GTAAAT-GTGACA-GAAAAT 

6450 6460 6470 6480 6490 

320 -.530 340 350 360 370 

—ATBACATGOTi' AGAACAGATG—CATS AGGATATAATCAG—TTTATG—GGATCAAAGCCTAAAGCCATGTG—T 

itiiit t ttt t lit ill i ii i : til i ill till ii i i ill i i 

i i : i i i i : iti i iii til i ii ii iii i tit iiii ii i t til i i 

TTTGACAT-GTGGAA-AAATGACAT-GGTA6AA-CAGATGCATGAGGAT ATAA-TCAGTTTATGGGAT 

6500 G51O 6520 6530 6540 6550 6560 

380 390 400 410 420 430 440 

AAAA—TTAACCCCACTCTGTGTTAGTTTAA-AGTGCACTGATTTGG GGAATGCTACTAAT ACCAA 

111 it; iii t i t i i i till i l i * iii i i i i I I i I i I ii 

l i • ill lit l i i i i t tilt iiii til l i l i i i l l l i II 

CAAAGCCT AAAGCCA-T6TGTAAAATTAACCCCACTCTGTGTTAGTTTAAAGTGC—ACTGATTTGAAGAA 

8570 6580 65S0 6600 6610 6620 6630 

450 460 470 430 490 500 510 

TACTAGTAATAOCAATAGTAGTAGiCGGGGAAATGATGATGGAGAAAGGAGAGATAAAAAACTGCTCTTTCAA 

i ; i i t l i i i : i 1 i t I i t i t I i ) i i t i : t i t i t 1 I I I i i ) i i i i i t i i i > t i t 1 I i i i i 1 I t t t i i i 

i t i t t t t i t t i t t t : i : i i i i t i i i t t t t i t i i i t i i t i i i i i i i t i i t t t i i » i i t i i i i i i i 

TGATACTAA' f ACCAATA6TACTAGCGGGAGAATGATAATGGAGAAAGGAGAGATAAAAAACTGCTCTTTCAA 
6640 6650 6660 6670 6680 6690 6700 


520 530 540 550 560 570 580 

TATCAGCACAAeNATAAGAGGTAAGGTGCAGAAAGAATATGCATTTTTTTATAAACTTGATATAATACCAAT 

1 . , ( i j i i i • i t : i i i I t I I t I i I I t I t t t t t t I I I I I I t t I I t I < ■ t i i I i t t i 1 i ! I I l I i i I t t I I I 
| I ; | | | | t * I 1 1 tlllltttllllltlltlilllllltllllltlltlttllltttttflltllllttttt 

TATCAGCACAAGCATAAGAGGTAAGGTGCAGAAAGAATATGCATTTTTTTATAAACTTGAT AT AAT ACCAAT 
6710 6720 6730 6740 6750 6760 6770 


590 600 610 620 630 640 650 

AEATAATGAT ACT ACCAGCT AT ACGTTGACAAGTTGT AACACCTCAGTCATTACACAGGCCTGTCCAAAGGT 
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AGA'I AATGAT ACT ACCAGCT AT AGCTTGACAAGTTGT AACACCTCAGTCATTACACAGGCCTGTCCAAAGGT 
6780 6790 6800 6810 6820 6830 6840 


660 670 680 SSO 700 710 720 730 

ATCCTTTGAGCCAATTCCCATACATTATTGTGCCCCGGCTGGTTTTGCGATTCTAAAATGTAATAATAAGAC 
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ATCCTTTGAGCCAATICCCATACATTATTGTGCCCCGGCTGGTTTTGCGATTCTAAAATGTAATAATAAGAC 
6350 6860 3870 6880 6890 6900 6910 


740 750 760 770 780 790 800 

GTTCAATG6AACAGGACCATGTACAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTATC 
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GTTCAATGGAACAGGACCATGTACAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTATC 
6920 6930 6940 GS50 6960 6970 6980 6990 


810 820 830 840 850 860 870 

AACTCAACTGCTGTTGAATGGCAGTCTAGCAGAAGAAGAGGTAGTAATTAGATCTGCCAATTTCACAGACAA 
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AAC1CAAC fGCTLTf T AAATGGCAGTCTAGCAGAAGAAGAGGTAGTAATTAGATCTGTCAATTTCACGGACAA 
7000 701O 7020 7030 7040 7050 7060 


880 890 800 910 920 930 940 

TeiCTAAAACCATAATAGTACAGCTGAACCAATCTGTAGAAATTAATTGTACAAGACCCAACAACAATACAAG 

I i : I t t t i r : t i ; t i t t 1 I • t : t t i t i t I • I i 1 i i t I I I t I I t i i i i t i i t i i I i i I 1 i t i t I t i t t i i 
t i i t t i i t t t t i t t t i t t i i i t i i r i t i i i l i t i l i t t t l i i t i i l t l l t t t i t i i l i i i t t i i i i i t i l 

TeCTAAAAOCATAATAiZn'ACAGCTGAACACATCTGTAGAAATTAATTGTACAAGACCCAACAACAATACAAG 
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950 860 970 980 990 1000 1010 

AAAAAGTATCCGYATCCAGAGGGGACCAGGGAGAGCATTTGTTACAATAGGAAAAATAGGAAATATGAGACA 
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AftP^AGAATCCGTATCCAGAGAGGACCAGGGAGAGCATTTGTTACAATAGGAAAAATAGGAAATATGAGACA 
7140 7150 7ISO 7170 7180 7190 7200 

1020 1030 1040 1050 1060 1070 1080 1090 

AGCACATTGT 44CATTAGTAGAGCAAAATGCAATGCCACTTTAAAACAGATAGCTAGCAAATTAAGAGAACA 
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AGCACATTGTA ACATT AGTARAGC AAA AT66AATAACACTTT AAAACAGAT AGAT AGCAAATT AAGAGAACA 
7210 7220 7230 7240 7250 7260 7270 

1100 11lO 1120 1130 1140 1150 1160 

ATT'f GGAA ATA ATAAA ACAATAATCTTT AAGCAATCCTCAGGAGGGGACCCAGAAATTGT AACGCACAGTTT 


it i t t t t i j i i t r t i i 
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ATTCGGAAATAATAAAACAATAATCTTTAAGCAATCCTCAGGAGGGGACCCAGAAATTGTAACGCACAGTTT 

7280 7290 7300 7310 7320 7330 7340 7350 

1170 1180 1190 1200 1210 1220 1230 

TAATTGTGGAGGGGAATTTTTCTACTGTAATTCAACACAACTGTTTAATAGTACTTGGTTTAATAGTACTTG 
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TAATTGTGGAGGGGAAT7TTTCTACTGTAATTCAACACAACTGTTTAATAGTACTTGGTTTAATAGTACTTG 
7360 7370 7380 7390 7400 7410 7420 


1240 1250 1260 1270 1280 1290 1300 

GAG1ACTGAAGG3TCAAATAACACTGAAGGAAGTGACACAATCACACTCCCATGCAGAATAAAACAATTTAT 
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RAGTACTGAAGGGTCAAATAACACT6AAGGAAGTGACACAATCACCCTCCCATGCAGAATAAAACAAATTAT 
7430 7440 7450 7460 7470 7480 7490 

131C 1320 1330 1340 1350 1360 1370 

AAAC ATGTi HGC AGG.A AGT AGGAAAAGCAATGT ATGCCCCTCCCATC AGCGGACAAATT AGATGTTCATCAAA 
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AAAGATGTGSCAGAAAGTAGGAAAAGCAATGTATGCCCCTCCCATCAGTGGACAAATTAGATGTTCATCAAA 
7500 7510 7520 7530 7540 7550 7560 

1380 1390 1400 1410 1420 1430 1440 1450 

TATTACAGGGCO GC T ATTAACAAGAGA T GGTGGTAATAACAACAATGGGTCCGAGATCTTCAGACCTGGAGG 
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TATT ACAGGGCTGCT ATT AACAAGAGATGGTGGT AAT AGCAACAATGAGTCCGAGATCTTCAGACTTGGAGG 
7570 7530 7590 7GOO 7610 7620 7630 

1460 1470 1480 1490 1500 1510 1520 

AGPAi ^ ATA TG AC’GG ACA ATTG5 AG A AGTG A ATT AT AT A AAT AT A AAGT AGT AA A AATTG AACC ATT AGG AGT 

i : : 

AGGAGATA' rGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAGT 
7640 7650 7660 7670 7680 7690 7700 7710 

1530 1540 1550 1560 1570 1580 1590 

AGCACCCACCAAGGCAAAGAGAA.GAGTGGTGCAGAGAGAAAAAAGAGCAGTGGGAATAGGAGCTTTGTTCCT 
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AGCACCCACCAAGGCAAAGAGAAGAGTEGTGCAGAGAGAAAAAAGAGCAGTGGGAATAGGAGCTTTGTTCCT 
7720 7730 7740 7750 7760 7770 7780 

1600 1610 1620 1630 1640 1650 1660 

TGGtiTTCTIGGGAGCAGCAGGAAGCACTATGGGCGCACGGTCAATGACGCTGACGGTACAGGCCAGACAATT 
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TGGuTTCT fEGGAGCA.GCAGGAASCACTATGGGCGCAGCCTCAATGACGCTGACGGTACAGGCCAGACAATT 

7790 7800 7810 7820 7830 7840 7850 
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AT TGiTCTG! TALVrGCAiGCAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAACAGCATCTGTTGCAACT 
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ATTGTCTGGTATAGTGCAGCAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAACAGCATCTGTTGCAACT 
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C AC;' ViTCTi rBSSGCATCAAGCAGCT CCAQSCAAGAATCCTGGCTGTGGAAAGAT ACCT AAAGGATCAACAGCT 
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CACAGTCTGGSGCATCAAGCAGCTrXAAI^AAG^ATCCTAGCTGTGGAAAGATACCTAAAGGWTCAACAGCT 
7930 7940 7950 7980 7970 7980 7990 


1870 1830 1840 1850 1860 1870 1880 

CCT&;nGGAiTT,::G6'7;TTGCTCTGGAAAACTCATTTGCACCftCTGCTGTGCCTTGGAATGCTAGTTGGAGTAA 
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CCTAGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTGCTGTGCCTTGGAATGCTAGTTGGAGTAA 
8000 8010 8070 8030 8040 8050 8060 8070 

1890 1900 1910 1920 1930 1940 1950 

TAAftTCTCTGGAACAGATTTGGAATAACATGACCTGGATGGAGTGGGACAGAGAAATTAACAATTACACAAG 
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TAAATCTC'f'GGAACAGATCTGGAATCACACGACCTGGATGGAGTGGGACAGAGAAATTAACAATTACACAAG 
SOSO 8090 8100 8110 8120 8130 8140 
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CTT AATACAT7CGTTAATTGAAGAATCGCAAAACCAGCAAGAAAAGAATGAACAAGAATTATTGGAATTAGA 
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CTTAATACACTCCTTAATTGAAGAATCGCAAAACCAGCAAGAAAAGAAT6AACAAGAATTATTGGAATTAGA 
8150 8160 8170 8180 8190 8200 8210 
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T AA ATGGGi ’ A Ai ;:TTT C-H'GG AATTGGTTTAACAT AACAA ATTGGCTGTGGT AT AT AAAAAT ATTCAT AATGAT 
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rAAATGGGCAA9fTTGTGGAATTGGTTTAACATAACAAATTGGCTGTGGTATATAAAATTATTCATAATGAT 
8220 8230 8240 8250 8260 8270 8280 


2100 2110 2120 2130 2140 2150 2160 2170 

AGT AQG AGGCT *i ’GGTAGGTTT A A6i A AT AGTTTTT6CTGT ACTTTCT AT AGTGA AT AG AGTT AGGC AGGG AT A 
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AGTAGGAGGCT7GGTAGGTTTAAGAATAGTTTTTGCTGTACTTTCTATAGTGAATAGAGTTAGGCAGGGATA 
8290 8300 8310 8320 8330 8340 8350 


2180 2190 2200 2210 2220 2230 2240 

TTCACCATTATCGTTTCAGACCCACCTCCCAACCCCGAGGGGACCCGACAGGCCCGAAGGAATAGAAGAAGA 
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TTCACCATTATCGTTTCAGACCCACCTCCCAATCCCGAGGGGACCCGACAGGCCCGAAGGAATAGAAGAAGA 
8360 3370 838C 8390 8400 8410 8420 8430 

7250 2260 2270 2280 2290 2300 2310 

AG:G C GGAGAGAGAGACAGAGACAGATCCATTCGATT AGTGAACGGATCCTTAGCACTTATCTGGGACGATCT 
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AGG'iLSGAGAGAGAGACAGAGACAGATCCATTCGATTAGTGAACGGiATCCTTGGCACTTATCTGGGACGATCT 
8440 0450 8460 8470 8480 8490 8500 


GC 


2320 

u.GASCCTTS - 


2330 2340 2350 2360 2370 2380 

iSCCICTTCAGCTACCACCGCTTGAGAGACTTACTCTTGATTGTAACGAGGATTGTGGAAC 


t i 
t i i 


GCGKAGCC -TCsTGCC 
8510 


TCTTCAGCTACCACCGCTTGAGAGACTTACTCTTGATTGTAACGAGGATTGTGGAAC 
8520 8530 8540 8550 8560 8570 


2380 2-100 2410 2420 2430 2440 2450 

TTC7 fSGGACGCAGCGnGTSGOsAAGCCCTCAAATATTGGTGGAATCTCCT ACAGTATTGGAGTCAGGAACTAA 

... i i i i t i i i i i i i l i i i i i i 
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TTC7 GGGACCCAlviGGGilsTGGGAAECCCTCAA AT ATTGGTGGAATCTCCTACAGT ATTGGAGTCAGGAACTAA 

8580 8590 8600 361O 8620 8630 8640 
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AATAAGAGAAAGAGCAGAGAUAGTGGCAATGAGAGTGAAGGGATCAGGAAGGAATTAT-CAG-CACTTGTGG 
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AAAIGGGGCACCATGCTCCTTGGEMTATTGATGATCTGTAGTGCTACAGAAAAATTGTGGGTCACAGTCTAT 
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TAT C.GGGTACCTGTGTGGAAAGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACA 
105D 10S0 1070 1080 1090 1100 1110 1120 

210 220 230 240 250 260 270 280 

GASGTACA' i Rf-Vl G'TTI GGGCCAC ACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAGTAGTATTGGTA 
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G AGGTACA' f AATATTT 6GGCCACACATGCCTGTGT ACCCACAGACCCT AACCCACAAGAAGT AGT ATTGGGA 


1. 130 

1 140 
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1 160 

1 170 

1 180 

1 190 
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AATGTGACAGftAAATTTTAACATG TGGAAAAATGACATGGTAGAACAGATGCATGAGGATATAATCAGTTTA 


i i : i i t t t 


AATQTEACAGAAAATTTTAACATGTGGAAAAATAACATGGTAGAACAGATGCATGAGGATATAATCAGTTTA 
1200 1.2 5.0 1220 1230 1240 1250 1260 

3 GO 3‘VO 330 390 400 410 420 

TGGSiATCAArtGCCTftAAuCCr :TGTT:T AAAATT AACCCCACTCTGTGTTAGTTT AAAGTGCACTGATTTGGGG 
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TGGGATCAAAGC'CTAAAGCCATGTGTAAAATTAACCCCACTCTGTGTTACTTTAAATTGCACTAATTTGAGG 
1270 1230 1290 5.300 1310 1320 1330 

430 440 450 460 470 480 490 

AATC iCTAC IT. ATACCAATACTAGTAATACCAATAGTAGT-AGCGGGGAAATGATGATGGAGAAAGGAGAGAT 
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AAT GATACTAGCACCAATGCT ACTAAT ACCACT AGT AGT AATCGGGGAAA-GATGGAGGGAGGAGAAAT 

1340 l350 1360 1370 1380 1390 1400 

500 510 520 530 540 550 560 

AAAAAACTGCTOTTTCAATATCAGCACAAGNATAAGAGGTAAGGTGCAGAAAGAATATGCATTTTTTTATAA 
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GACAAACTGCTCTTTCAATATCACCACAAGCATAAGAAGTAAGGTACAGAAAGAATATGCACTTTTTTATAA 
1410 1420 1430 1440 1450 1460 1470 


570 590 550 600 610 620 630 640 

ACT T G AT A I A.A i ACCAAT AGAT AATGATACT ACCAGCT AT ACGTTGACAAGTTGT AACACCTCAGTCATT AC 
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i t i i i i i i l i i t t i i i i t i i i i t i i i i i t t i i i t i l l i i i i i i i i i t t i t ii » i i i i i it 

i i i i t i t i i i i i i t i i i i i i t i i i i i i i i i i i i i i i i i ■ i i i i i t i i i ii ii till ii 

AAACCCTAAAACCATGTGTAAAGCTAACCCCACTCTGTGTCACTTTAAACTGCACTAATGTGAATGGGACTG 
6140 B150 6160 6170 6180 6190 6200 

430 440 450 460 470 480 490 

CTACTAAT-ACCAA7ACTAGTAATACCAATAGTAGTAGCGGGGAAATGATGATGGAGAAAGGAGAGATAA 

i : l t i i t r i it ; r i l i i l t t t i ill ill i t t i i i l i i i l i i 

it tti it i t ii i i i i t t i i t ii i itt lit i i t i i i i i i i t i i 

CTGTGAATl-'iGGACTAATGCTGGGAGT-AAT AG6ACT AATGCAGAATTGAAAATGGAAATTGGAGAAGTGA 

6210 6220 6230 6240 6250 6260 6270 

500 510 520 530 540 550 560 570 

AAAACTGC , r CTTTCAATATCAGCACAAONATAAGAGGTAAGGTGCAGAAAGAATATGCATTTTTTTATAAAC 

: i i ; : i l l t t i i t i i i i t i t i t t it) i i i i i l i i i i i i i i l i i i i i i t i i i i i 

i i i i t i : i ; > i t i i t i t i i tt tit i tii it i i i i i i i i i i i i i i i i i i i i i 

aaaactso rcn' ccaa r ataaccccagtagsaagtgataaaaggc—aagaatatgcaactttttataacc 

6280 6230 G300 6310 6320 6330 6340 

500 590 600 610 620 630 

TTGA'f ATAATACCAATA GATAATGATACTACCAGCTATACGTTGACAAGTTGTAACACCTCAGTCATTA 

: i i i ii t t i tit: i i f j t t i i ii it i i i i i i i ii i i i i t i i i i i t i i i till 

: i i t i ii i t : i t i i iiii t t t i i it it tilt i t ■ ti i t t i i i i i i i i i t i till 

77GATCTAGTACAAATAGATGATAGTGATAATAGTAGTTATAGGCTAATAAATTGTAATACCTCAGTAATTA 
6350 6360 6370 6380 6390 6400 6410 


640 650 660 670 680 690 700 710 

GACAGGCCTGTCCAAAGGTATCCTTTGAGCCAATTCCCATACATTATTGTGCCCCGGCTGGTTTTGCGATTC 

• * : ' I I I I 1 I , 1 1 :! I I I I ! t t 1 t t ! 1 t I 1 I t t I I 1 1 1 1 I I I I I I I I 1 t I 1 I 1 I t I I I I I 1 1 till 

I i - i t i i i : i i i t i • I i t t i i i i i i i i i i i t i ; i i i i i i t i i t t i i i t t i i i i t t i i i i i i i tilt 

Q40YTG6CT HC' OQA A AGGT A4CCTTTG ATCCA ATTCCC AT AC ATTATTGTGCCCC AGCTGGTTTTGC AATTC 

6420 64? SO 6440 6450 6460 6470 6480 


720 730 740 750 760 770 780 

TAAAATGTAATAATAAGACGTTCAATGGAACAGGACCATGTACAAATGTCAGCACAGTACAATGTACACATG 

til) I I I I I I I I t 1 I 1 I ! [ J t ! I I I 1 I t I I t 1 t I 1 I I 1 I t I t I t I 1 I I I I I 1 I 1 I I 1 I I I I t I 

t i i i i t i i t • : t i i i i i i r i t i i t i i t t t i t i i i i i i * i i i i i i t i i i i i ■ i i i i i i i i i i 


i 'AAARTG7 AATClAT AAGAAGTTCAATGGAACGGAAATATGTAAAAATGTCAGT ACAGTACAATGT ACACATG 


6490 


6500 


6510 


6520 


6530 


6540 


6550 


6560 


790 600 810 820 830 840 850 

laftPfi ” r AGGCC AGTAGT AT C AACT G‘A ACTGCTGTTGA ATGGC AGTCT AGCAGAAG AAG AGST AGT A ATTAGAT 

i ; i i : i i t • i i » t i i t : t j i t i t t r t i i i i i i > t i i i t i i i i ■ t i i t i i i i i i » i i i i i t i i i 

i i t : i i i i j t i i ii i i i t i i i i i i i t i i i t i i » t t i i i i i i i i i t i i i i i i i i i ii i i i i i i i i 

GAATTAASCCAC: fG 3TGTCAACTCAACTGCTGTTAAATGGCAGTCTAGCAGAAGAAGAGATAATGATTAGAT 
6570 6580 6590 6600 6610 6620 6630 


960 

CT6CCAAI r T 


C 


I III ! 
t III t 

3AAAATCT 

5:540 


870 880 890 900 910 920 

‘.C AG 05 A ATGCTf1 AA ACC AT AAT AGTACAGCTG A ACC AATCTGTAGAAATTA ATTGTACA A 


r t i i t I I I i i 

1 t ! ! [ I 1 t I f 


:aat 


• it t i i i i i t i i i i i i i ■ it ti i i i i t i i i i i i i i i i i i i i 

iii i i i i » i t i i i i t i i i it ti i t t i i § i i i i i i i t i ■ i i i 


.AAftACATAATAGTACAGCTTAATGAAACTGTAACAATTAATTGTACAA 
;SSO 6670 6680 6690 6700 





£m.» 93G SSO 970 980 990 

t-a/AaCCftACnftCAni^C^SPlf-^mRTftTCCGTOTCCftGAeGSBACCASGQASAQCATTTQTTACAATAGGAA 


t t i : i t t t t i t 


i i t t i tit 


GGCCrrSQfV^^Cf^’n^C^S^^Sft&tvGAl'ACATTTC- 
670 6720 9730 6740 


—GGCCCAG66CAAGCACTCTATACAACAGGGA 
6750 6760 6770 


100O in;.;;, 1020 1030 1040 1050 1060 1070 

f WVl 'AGSAA AT A t GAGACAAGCACATTCsiT AACATTAST AGAGCAAAATGCAATGCCACTTT AAAACAGATAG 


t • i i » t i ttiit 


TAGYf'iGGAGATA rAAGAAGAGCATATTGT ACT ATT AATGAAACAGAATGGGAT AAAACTTT ACAACAGGT AG 
6780 3730 6800 6810 6820 6830 6840 

1080 1030 11OO 1110 1120 1130 1140 

CTA6CAAA i • 7AAGAGAACAATT rGGAAATAATAAAAC^ATAATCTTTAAGCAATCCTCAGGAGGGGAOTCAG 

t i i ; t rt it i i ii t i ii i ii i I i i i i i i i I i t i i t i t i i i i i i ■ I i ■ i i i ■ 

i i r;i : i t t i I ii it t i i ii t t i I i I i i i i i i 1 i i I i I I I I I I i I i i i I i I 

CT51 ' A AAACTAGGA-AGSC—CTTC fT AACAAAACAAAAAT AATTTTTAATTCATCCTCAGGAGGGGACCCAG 
6650 8880 6870 6880 6890 6900 6910 

1150 1160 1170 1180 1130 1200 1210 

AAATTGTAnCGCACAGTTTTAATTGTGGAGGGGAATTTTTCTACTGTAATTCAACACAACTGTTTAATAGTA 


i t t » i i t i i i 


i i i i i i i i i i i 


<¥08 iTACAACACACAGTTTTAAT1'GTAGAGGGGAATTTTTCTACTGTAATACATCAAAACTGTTTAATAGTA 
6920 9630 6940 6950 6960 6970 6980 


1220 1830 1240 1250 1260 1270 1280 

CTT6GTTTAAT AGT ACTTGGAGT ACTGAAGGGTCAAAT AACACTGAAGGAAGTGACACAATCACACTCCCAT 


7000 


7010 


-TAATAGCACAGAGTCAACTGGTAGTATCACACTCCCAT 
7020 7030 7040 7050 


1280 1300 1310 1320 1330 1340 1350 

QCAGAATAAAAOAATTT AT AAACATGTGGCAGGAAGT AGGAAAAGCAATGT ATGCCCCTCCCATCAGCGGAC 

i : i i i t i i t t : t t : : i i i t t t t i i i i i i i i i i i i i t t i i i i ... i i i i i i i i i i i i i 

i i i i » i i t i t i t i i t i i i i i t t t i i i i i t i li i i i i i i » l i i i i i i i i i i i i i i t i i i i tit 

GCftGAAT AAAACAAATT AT AAATATGTGGCAGAAAACAGGAAAAGCT ATGT ATGCCCCTCCCATCGCAGGAG 
70.:,i.; 7070 7080 7090 7100 7110 7120 

1360 1.470 1380 1390 1400 1410 1420 

AMAT"! AGA’OTiTTGATCAAATATTACAGGGCTGCTATTAACAAGAGATGGTGGTAATAACA-ACAAT-GGG 

I ; 1 l ; I : : i l I t • ! I I I t : 1 I t 1 I I tttt*ttll***»**«* ti till I I I l I I I 1 

t ■ i i : ; i i t i t i i i • i i i t i t i i i t i i t i t i i i i i i i i i i i i i t till l i i i i i i l 

TCATCAACTGTT 7ATCAAATATTACAGGGCTGATATTAACAAGAGATGGTGGAAATAGTAGTGACAATAGTG 
7130 7140 7150 7160 7170 7180 7190 

1430 1440 1450 1460 1470 1480 1490 

TC—CGAGA1 ClTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAG 

i till iii : t t i t i t : t t t t t t t i i i i i i i i i i i i i i t i i I i i i i i i i i i i i i I i i I i t i I i i i i 

i ttii * i i i i i t ■ i i t i ; i i t i i i t ■ i i t t i i t t i i t i i i t i i i i t i i i t t i t i i i i i i i i i i t i 

ACAATGAuiACOI T A AGACCTCOGAGGAGGAGAT ATGAGGGACAATTGGAT AAGTGAATT AT AT AAAT AT AAAG 
7800 7210 7220 7230 7240 7250 7260 

1500 1510 1320 1530 1540 1550 1560 

TAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAG 

i i i i i i r i i : i i i i f i i i i t i i i t i i i i i i i t i i i i .. i t i i i i i i i i i i i i i i i i i i i i i 

t i « i i t t t i t t i i t t i i t i i i i i i i t t t i t f i t i i i i i i i i i i i i • i i i i t i t i i i t i i i i i t i i i i 

TAfVTAAGAA'TTGAACCCCTAGGASTAGCACCCACCAAGGCAAAGAGAAGAGTGGTGGAAAGAGAAAAAAGAG 
1870 7280 7290 7300 7310 7320 7330 

1570 1590 1590 1600 1610 1620 1630 1640 

CAGTGGGAATAGGAGCTTTG.iiTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCACGGTCAATGA 

|| | lit l t * t 1 t I I ! I I t I t I I t 1 I I I I I I I t t I I I I I 1 1 I I > I 1 t I 1 I I I I I I I 1 till I 1 

:t i it: I I I i i I i i I I i t I r I I I I 1 1 t I I I i I I I I I I i i I * I I 1 I I I t t t I I I I I I I I I I 

CAAVAGGACTAriGAGCCATGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACGATGGGCGCAiGCGTCACTAA 
7340 7350 7360 7370 7380 7390 7400 7410 

1650 I860 1670 1S30 1690 1700 1710 

6G5 rGACQ ■.;TA5AGGCCAGA5AA7TATTGTCTCiGTATAGTGCAGCAGCAGAACAATTTGCTGAGGGCTATTG 


CGG- 'i '-lACG i-iTAr/Ai? 1 ?. ;CA6AC;471 !'ACTG T CTGGTATAGTGCAACAGCAAAACAATTTGCTGAGGGCTATAG 
7420 7430 7440 7450 7460 7470 7480 



1720 :>7"!0 5.740 1750 17B0 1770 1780 

AGST-GCAA* lAGCATCn is rns« IftftCrCftCAtaTCTGGGGCATCAAGCAGCTCCAGGCAAGAATCCTGGCTGTGG 

i • t I ! • t I 1 i I > • t : t t i : t t 1 1 t i ! ! I t I i : i i i J i I i t i III.I I i i i i I t t t I I I 1 I i I I I 

• i i t t i i i i i ; * > i t i i • i t t * i i i i • i j t t t i t i i » t it i i i i i i i t t t t t i t i ■ > i i i i i ■ i i i i 

AKSCBCftACASCm-C'nSTTGCft^CTCACeGTCTSQQGCATTAAACAQCTCCASGCAAGAGTCCTQGiCTQTQQ 
7490 7500 7510 7520 7530 7540 7550 

1790 1500 ISiO 1820 1830 1840 1850 

ftAAGATAD ;TAftASGAYCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGO*CCACTGCTG 

t t i : i : j t • : - i i r r i : t i t i i : i I i I : I i i i i i i t i i i t i l i i t I i I I I i i I I .. I I 

t I I i I i i t ( I t i I r II II I t t i : t i t i i i t I i I I i I I I I i i i i i i i i i i II 

AAAi?,;''iTACi:77-‘G; vGG ATCAACGGCTCCi AGGAATGI GGGGTTGCTCTGGAAAACACATTTGCACCACATTTG 
75SO 7570 7550 7590 7600 7610 7620 

1860 1870 1060 1890 1900 1910 1920 

TGCCTTGGAATGGT AGTTGGAGTAAT AAATCTCTGGAACAGATTTGGAAT AACATGACCTGGATGGAGTGGG 

| | ■ J ! t t t 1 1 : I I I l ! i t t I ’ I l J I 1 ! I I I II I t I I I I I 1 . I t I I I I I t 1 I I I I 

t I I f t I I I I I I I i : t I t I ! t I « I I I i i I I I II I t » I I i t I I I I I I I I t I i i t I I i t i i i I I i 

TGCGTTGG,GftC'! TIT AG rTcG AGT ft AT ftGftTCTCT ftGATGftCftTTTGGftAT AftT ATGACCTGGATGCAGTGGG 

763u 7640 7B50 7660 7670 7680 7690 

193!.l 1340 1950 1960 1970 1980 1990 2000 

ft.CAGftGAAftTTftACftA'TTftCftCAftGCTTftftTACATTCCTTAATTGAAGAATCGCAAAACCAGCAAGAAAAGA 

( i i i • i ; r ( t i i i i i i i t i i ti i i I t t t 1 i i i I i I i i I i i I i I I i I i i i i i t i i i i t i i i i i 

i : j i i i t j i f i t i t i i : j ( tt i i t i i i i : t i i i i i i i i i ■ i .. i t i i i i i l i l i i i i 

AAAAAGAAft' i ‘ VAGCftATTACACAGGCATAATATACAACTTAATTGAAGAATCGCAAATCCAGCAAGAAAAGA 
7700 7710 7720 7730 7740 7750 7760 7770 

201 >:; ; >020 2030 2040 2050 2060 2070 

ATGAACAAGm.AT f AT'VGGAATTAGATAAATGGGCAAGTTTGTGGAATTGGTTTAACATAACAAATTGGCTGT 

i i i : t i ; t ; i t i : i t i i i i t 1 i i i t i i i i : i i i i i i i i i i t i I i i I i I i i I I i till i l i t t i i 

till 1 : t i ; t i i : t i t i t t ti t i .. i i i i i i t t i i i i i i i i i i i i i till i i i i i i i 

ATGftAAAGGAA'i" f ATTGGAATTGGACAAGTGGGCAAGTTTGTGGAATTGGTTT AGCATATCAAAATGGCTGT 
77AO 7790 7000 7810 7820 7830 7840 

2030 2030 7100 2110 2120 2130 2140 

66 i'ATATAAAAATATI CATftft'"GATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTGCTGTACTTTCTA 

] ; ! : 1 1 I I ; ! I ( I I i : i i i i i i i i i i ; i i i i l t i t i i < t t l i t i t i l i i i i t i i i i l i t i i i i i 

i j t i i t i i : l ! i t i i i i i i i i t i t l t t I I I i i i i i i i i i i i i i i i i t I | i i i i i i i i t i i i i i i 

RG7ATATAAGAATATTCATAATAGTAG7AGGAGGCTTAATAGGTTTAAGAATAATTTTTGCTGTGCTTTCTT 
7650 7860 7670 7880 7890 7900 7910 

2150 2160 2170 2180 2190 2200 2210 

Tf-.GYGAATAGAG FT AGGCAGGGATATTCACCATTATCGTTTCAGACCCACCTCCCAACCCCGAGGGG-AC 

t : i : I i i t i t • i i i i : » i I i I : I I t t i t t I I I I i i i i i I » t i i t i i i i i I I i I I t I i i i i I 

i i ; i • l I : I ; t 1 : 1 I f ! t I t t I I 1 i I t ) I I I i I I t i i i I i I I > I I I I I I t I i i I I I i i t t I 

T ACT; ‘AA ATAG A G T"T AGGCAGGGAT ACTC lACCTCTGTCGTTGC AG ACCCTCCTCCCAACACCGAGGGGACCAC 
77120 7930 7540 7950 7960 7970 7980 

2220 2230 2240 2250 2260 2270 2280 

rCGACAGGCCCGAAGGAATAGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCCATTCGATTAGTGAACG 

( » | ■ I I I t I 1 I l 1 I l I I I I t I 1 I t l ! I I I I I > I I 1 I II I 1 I 1 I I I I I I I I I I I I I I I I I I I I I I 

I t ( ! I t I l l t I I I I * I l I I I * I 1 I f I I I I I I t I I I I i I I I I I » I I I I I I I I I | I I I I I I I I I I I 

CCGftCAGGCCCGAAGGAATAGAAGArtGAAGGTuiGAGAGCAAGGCAGAGGCAGATCAATTCGATTGGTGAACG 
793> ) 8000 SO10 8020 8030 8040 8050 

2250 2300 7310 2320 2330 2340 2350 

GAT*.' !CTT AGCftCT" ATCTGGGftCC'AT CTGCGGAGCCTTGTGCCTCTTC AGCT ACCACCGCTTGAGAGACTT A 

, , t t i i i i i > i t : ; I t t t t i i i t tit lit ii t i i t i l i ■ t i i i t i t i t I t i i t l i t i l l > ■ i ■ i t 

( 1 l| I | | I T I 1 I I I 1 I t 1 1 III lit II t 1 I t I I t I I • I I I | | | | | I I t I I I I I t I t I 1 I I I 

GATTCTCAGCAC 'fTftTCTGGGACGftCCTGAGGAACC—TGTGCCTCTTCAGTTACCACCGCTTGAGAGACTTA 


SOoo 

J;i070 

O 

O 

CO 

8090 

8100 

81 10 

8120 


2360 

2370 

233 v.' 

2390 

2400 

2410 

2420 

2430 


CTC'i TGAT' fGTAftCGAGGAT fGTGGAACTTCTGGGACGCAGGGGGTGGGAAGCCCTCAAATATTGGTGGAAT 

, , , i i i j i i I i i t i t i i : i t t i i i i i • ■ i t i • i « i i i t i t l t I I I i i I i i i I i i l l l t l l l l l l l l l 
: i i t ii- * i t t > i t t i i : - i i t t t t i * i t t i i i i t i i i i i i i i i t i i t i i i t i i i i » i i i i i i i i i 

(TP : t TAAT fGCA:‘CGfYGGAT :’GTCtGAACTTCTGSGAC6CAGGGGGTGGGAAGCCCTCAAATATCTGTGGAAT 
3130 13 .'.40 3150 8160 3170 8180 8190 8200 

24 4 G 2450 X 

C 7 CG fACAr-rTf-Vi V GGXGTCAGGAACTAAAG 
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3 ole by sheppard on Thu 8 Mar 90 12* 16 3 37 PST. 


Query sequence beiny coYnpa^wd - KUNZ 158 CL3p» SEQ j \ J 

Number of sequence? -searched* 31228 /n n I r-o / n <7 ~^ 

Number c-r scores above cutoff 33 U([ 

Results cu the- imtisi comparison of KUNZ—158—C-L33, SEG with* Q^\<3l fO 33 
Date ben!' " ErnRer.:-: r-; 2 , n all entries 

Cbcnb'an 

100000 - 




N 

IJ50000- 

M 
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E 

R ■: 

□ 

F100QC- 


E 5000- 
9 

u 

E 

N 

C 
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S 1000- 


SCORE C 
STDEV 


7",'4 1032 12S3 1547 1805 2063 2321 



Similar’ l y mctv :■ 
Mismatch pena •. ‘t 
Gap pen,-.', l ty 
Gap s iy.i nei r-*i r ■_ 
Cutoff "score 
Random>.• >7 5. n n . '• 

Initial scorns 

Opt i rn i >:eu> sc:. 1 ro 


Scores £ 


Times £ 


Number o re-. d 
Number of ae-yi 1 ! 
Number til score 


The. scones brio 
Sign if loanee is 

A 100% ident :• cr, 


The 1 let of ' -er 


Sequence' Name 


1. H j. VBRUQr: 

2. HiVPVR'f 

3. HI' V! T<B3 

4. MIV3H1U2 

5. i-irvi 

S„ HIVNL43 

7, H7VELICG 

8 , H7VSC 

9, HI VH73FM8 

10. HXVZ3:-.' 

11. HIVMNCG 

12 . HIVZ2ZC 
13,. f-OVTG 

14. HIVJM32 

15, HTVJVl 

16„ H J V'CDCr 2 
17, Hj'.'MAL 


PftRP.METHRS 
imtei y 


■i \ 13 
■r>-£\SC; 


i oo 
O. -'3 
1 45 
o 

70 

no 


K“tuple 

jn I n i no; pena 1 ty 
Window size 


4 

30 

32 


Alignments to save 
Display context 


10 

O 


SEAR7H STATISTICS 

Mean Median 

37 36 


u^s ■ 

, ices 
ad'* 


GO "S3 ”70.. 05 


searcheds 
iVS cutu-! v :■ 


Standard Deviation 
13- 78 

Total Elapsed 
02s 16 533- 00 


37183950 
31228 
j’>Q 


w are sorter' by initial score, 
cal rule tad based on initial score. 


sec-nance rc 


a the query sequence was not found. 


t. scr 


~es 


Oareription 


Init. Opt. 
Length Score Score 


S i g. Frame 


■■■*&: H 5 standard dev/iat ions above mean *##* 
Hunan \ rntm.iilodeficiency virus t 9223 2321 243G 

*-:-.•** 93 standard deviations above mean **** 

mmu.nodeficiency virus t 9770 1877 2180 

92 --standard deviations above mean **** 

1 m iin -i vode fie i ency virus t 3156 1873 2176 

immuTiodef iciency virus t 8932 1872 2176 

imiii’.iiiDdaficiency virus t 9718 1858 2164 

85 standard deviations above mean **** 
iminur.adeficiency virus t 
6l standard deviations 
ivmiiunodef iciency virus t 
5S standard dev/iat ions 
;i. mmunodef i c i ency v i rus t 
54 standard deviations abovs 
immunodeficiency virus t 
52 standard deviation; 
immunodeficiency virus t 
4F, standard dev/iat ion: 
i mmunodef ici ency v i rus t 
47 standard deviations 
i mvnuT adz- f ic i ency v 1 rus t 
i. min'. >:< iodef i c i ency v i rus t 
immunodeficiency virus t 
i. mm; ! t iode f i c i ency v i rus t 
. 45 wtendard deviations 
... mmu i iode f i c 1 ency v 1 rus t 
44 standard dev/iations 
i mwi ;node fid ency v 1 rus 
; ..-MncJerd dev/iat ions abov/e mean **** 


Hi. 'man J. 

!v 7C 7* V 

Mi rman 

HiitiiC n 
r iUiiiDi ! 

HU. lilr? D 

Human 
H' .mven 
Human 


HUiTiQT'i 

Human 

Human 

HL’rnari 

Human 

v!‘ w vr ■ 
Hi i. vic^n 

f 'man 


9709 

1729 2169 

above mean **** 


9176 

1246 

1893 

above mean *### 


4273 

1159 

2139 

above mean 


3563 

1 1 12 

1761 

above mean 


; 3457 

1066 

1959 

above mean **** 


: 9738 

99G 

2203 

above mean *#*# 


9081 

983 

1908 

: 5159 

983 

1915 

: 2903 

975 

1365 

: 2653 

972 

1917 

above mean 

r. 3373 

932 

2191 

above mean **** 

t 9229 

SIS 

2041 


1 15. 50 

93. 04 

92. 84 
92. 79 
92. 08 

85. 56 

61. 14 

56. 74 

54. 36 

52. 03 

48. 49 

47. 84 
47. 84 
47. 43 
47. 28 

45. 26 

44. 45 


0 

0 

0 

O 

0 

0 

O 

0 

O 

0 

O 

O 

O 

0 

0 


O 









18 . 

HIV RFHNV 

Huvivtn 

13 .. 

H 1 VRF 

Hu’rtitn 

20 , 


Human 


20, m l Human xnrmu nodef iciency vi rus t 3737 883 

The scores beXaw are nortec’ by optimized score. 

Significance is calculated based on optimized score. 

A 100% l dent n eb. sequence? to the query sequence was not found. 


2622 

887 

1567 

42. 38 

0 

912S 

887 

1567 

42. 38 

0 

3737 
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~4iiLO 1 s.et.ide sequence of the AIDS Virus? LAV 
c:®l l 40? 9-17 (1985) 
tul 1 st.-e-f f_ rev/law 

2 (bases 17J2 to 1.749? revision of [13) 

A1 x . 40 n i M ? Wain-Hobson 5 S. ? Montagnier ?L. and Sor»igo»P. 

Genstic veriabiIity of the AIDS virus: Nucleotide sequence analysi 
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cv this virus has been constructed by Keith Peden. Molecular Bio- 
iociv and Genetics, Johns Hopkins University School o-f Medicine, 
Baltimore, MD 21205 <301) 355-3B52. HIVNL43 is also an infectious 
clone having for its 3’ half a clone of the BRU isolate. 

Acquired immune deficiency syndrome (AIDS) is caused by a 
retrovirus known by several different names, probably representing 
two separate strains- human T-ccll lymphotropic virus-m 
(HlLV-Tii) and iymphadenopathy-associated virus (LAV) are thought 
to be one strain, and AIDS-associated retrovirus type 2 (ARV-2) the 
other. All throe viruses, whose sequences do not differ by more 
than about 6%, are believed to belong to the retroviral subfamily 
Lentlviridae» or "slow" viruses. 

For the details of the annotation and for other pertinent 
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stop codon at position 
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gag polyprotein 
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2031'; in-frame stop codon at 3783) 
wlf protein 
vpr protein 

tat protein, exon 2 (first expressed exon) 
tat protein* exon 3 (AA at 8337) 
rev protein* exon 2 (first, expressed exon) 
rev protein* exon 3 (AA at 8338) 


2 (first expressed exon) 

3 (AA at 8337) 

2 (first, expressed exon) 

3 (AA at 8338) 


vpu protein (premature termination) 
enve1ope polyprotein 

nef protein (premature termination at 3357 
relative to other HIV-l sequences) 


pre-msg 

454 

9655 

genomic mRNA 

pre -msg 

454 

9655 

tat* rev* nef subgenomic mRNA 

I VS 
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579 X 

tat* rev* nef subgenomic mRNA intron 
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8395 
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LTR 
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5’ LTR 

LTR 
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3’ LTR 

rpt. 
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Sp1 binding site III 

binding 
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Sp1 binding site II 

binding 
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Spl binding site I 

bind!ng 
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primer (Lys-tRNA) binding site 

site 

3783 

3785 

pol cds in-frame stop codon 

signa1 

3631 

9636 

mRNA po1yadeny1ation signal 

E COUNT 

3483 a 
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ORIGIN 


Left end of viral genome 


Initial Score 
Residue Identity 
Gaps 


SOS Optimized Score « 2203 Significance - 0.00 

30% Matches ■=■ 2233 Mismatches = 170 

53 Conservative Substitutions - 0 


x lO 20 30 40 50 60 70 

ATGAGAGTG:AA'3f AGAAATATCACCACTTGTG13AGATG6GG5TGGAAATGGGGCACCATGCTCCTTGGGATA 

i ; ; ! ; ! i i > ! '< > '< > > > • 

ATGAGAGT6AA0.3—GC1ATC AGGAGGAATT AT—CAG—CACTGGTGGGGATGGGGCACGATGCTC-CTTGGGTTA 
6240 3250 6260 6270 6280 6290 6300 

SO 30 lOO llO 120 130 140 

TT6ATGATCTGTA6TGCT ACAGAAAAATTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAGGAAGCA 

: ; ; : ; : ;;;ii;: ; :;;;:::;!i:i ; : i : : ! : ; : 

TTAATGATCTGTAGTGCTACAGAAAAATTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAAGAAGCA 
6310 6320 6330 6340 6350 6360 6370 

150 160 170 ISO 130 200 210 

ACCACCACTCTA FTT"l GTGCATCABATGCTAAAGCATATBATACAGAGGTACATAATGTTTGGGCCACACAT 

; ; ; ; ; ; ; ; ; ; ; I ! ! I ; I I I ! J I I I ! I ! I I I I J J ! 3 I I ! I * * • 1 * * 1 1 • 1 * • * ! ■ 1 1 * 1 1 1 1 1 1 1 ' * ' ' * ' 
ACCAGCACTCt ATTTTGiTGCATCAEiATGCT AARGCfTTATGAT ACAGiAGCaT A CAT AATGTTTGGGCCACAGAA 



6380 


6530 


6400 


6410 


6420 


6430 


6440 


6450 


220 230 240 250 260 270 280 

C5CCT GTGTACCC/-1CPCA'CCCCAACCCACP.AGAA6T POT ATTGGT AAATGTGACAGAAAATTTTAACATGTGG 

t i t ... ..! 1 ! ! ! I ! I ! I I I I I J ! I ! ! I I I I I • < < < • * * * * • * 1 1 * 1 1 1 1 1 

0i-i—r OTHTPCCOnC'^OPCCCC A ACGC ACA AG AAGT AG AATTGGT A A ATGTG AC AG A AA ATTTT A ACATGTGG 
"" 6460 6470 5480 6430 6500 6510 6520 

230 300 310 320 330 340 350 360 

^Puq^^TGACATGGTAHAACAGATGCATGAGGATATAATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTA 

.* » * * ■ • ' ''' !!!!!!!!;! ! ! ! ! ! I 1 \ I I 1 ! ! . . ! I ’ « . « 1 • * 1 1 ' 1 ' 1 ' 1 1 ' 


AAAOATAAOATGGTALIAACAGATC-“CATGAGGATATAA 
6530 6540 6550 6560 


6570 


6580 


6590 


370 380 390 400 410 420 

AAATTAACCCCACTCTGTGTTAGTTTAAAGTGCACTGATTTGGGGAATGCT act AATACCAATACT AG 

;;;;;;;;; f !l I ! I !!!!!! I I ! ! I ! I I ! ! ! ! I ! ! I ! !!!!.!.•.* * * • 

TAACCCCACTCl 'GTG'i TACT’l'T AAATTGCACTGATTTGAGGA AT ACT ACT AATACCAATAATAGTACT 
6600 ~ 6610 6820 6630 6640 6650 6660 

43 Q 440 450 460 470 480 490 

—TAATAnCAATAGTAGTAGCGGGGAAATGATGATGGAGAAAGGAGAGATAAAAAACTGCTCTTTCAATATC 

, , , , , , , . it i t i i t t i > i i > > • i t i * i t * ' 

ittii i i i i i i i : ; i i t i t i it j [ | j | ] ( i i t i 1 t i .. t l l l i l i ( i l i l i l 

GCTAATAACAATAGTAATAGCGAGGGAACAATAAAGG-GA—GGAGAAATGAAAAACTGCTCTTTCAATATC 
6670 6680 66S0 6700 6710 6720 6730 

500 510 520 530 540 550 560 570 

AGC AC A AGN AT A AG AGGT AAGGTGC AGA A AGAAT ATGC ATTTTTTTAT A A ACTTGAT AT AAT ACC A AT AGAT 


i i i i i i i i i ■ ■ < < 


i i t t i t i i i t i i i i i i i i 

t | | | i i i t i ii ******* 


ACCACAAGCAT AAGAG AT AP.GATGCAGAAAGAAT 
6740 8750 6760 6770 


6780 


6790 


6800 


530 5S0 600 610 620 630 640 

AATGATACTACnAGCO ATACRTTGACAAGTTGTAACACCTCAGTCATTACACAGGCCTGTCCAAAGGTATCC 
::::::::::: ::::::::: ::::::::::::::::: :: ::::::::: ::::: 
AATGATAGTACCAGCTATAGGTTGATAAGTTGTAATACCTCAGTCATTACACAAGCTTGTCCAAAGATATCC 
6810 BS20 6830 6840 6850 6860 6870 6880 

650 660 670 680 690 700 710 

TTTG AGCC AATT CCCi-V fACATTATTGTGCCCCGGCTGGTTTTGCGATTCT AAAATGT AAT AAT AAGACGTTC 


. i i i t l t i i i i i i t i i i i i i i t i i i i i i i i i 

I I ( t | | | 1 I I < I I t 1 t 1 I I I I I till I 11)1 


TT T LV s RCCAATTCCCAiACPUTPiTi "GTgCCCCGGCTGGTTTTGCGATTCTAAAATGTAACGATAAAAAGTTC 
6890 6900 6910 6920 6930 6940 6950 

720 730 740 750 760 770 780 

AATGGAACAGGACCAYGTAGAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTATCAACT 

: ; : ; : : : : ; : - s J - - - ! ! : 5 :5 :! ! ! 5 5 5 : ! 

AGTGGAAAAGGATCATGTAAAAATGTCAGCACAGTACAATGTACACAT6GAATT AGGCCAGT AGT ATCAACT 
GSSC 6970 6980 6990 7000 7010 7020 


7S0 800 810 820 830 840 

CAACTGCTGTTGAATGGCAGTCTAGCAGAAGAAGAGGTAGTAATTAGATCTGCCAATTTCACAGACAATGCT 
;;;;;;;;;;; ; ; I ! ! I ! I ! ; ! * ....... * . * ...... 

CAAt;TGCTGTTAAATGGCAG' FCTAGCAGAAGAAGAGGTAGTAATTAGATCTGAGAATTTCACTGATAATGCT 
jqw.q 70 t\Q 7050 7060 7070 7080 7090 

860 870 880 890 900 910 920 930 

AAftPCCATAATAGTAf AGCTGAAf 'CAATCTGTAGAAATTAATTGTACAAGACCCAACAACAATACAAGAAAA 

; ;'I; ; ; ; ; ; : ; :::::::: ; ; : ... ; :::::: 

AAAACCATCATPGT ACATCTG AATGAATCTGT ACAAATT AATTGT ACAAGACCCAACT ACAATAAAAGAAAA 
7100 " 7110 7120 7130 7140 7150 7160 

940 950 360 970 980 990 

AGTATCCe- r 4Tr:CARAGGGGACCAGGGAGAGCATTTGTTACAATAGGAAA-AATAGGAAATATGAGACAA 

;;;;;;;; : ; ... i : s : .. . : : 

AGGPTACA W• 4H-mnrfli -rr^flR ftGCATTTTATACA ACAAAAAAT AT AAT PGGAAGT AT AQGftC A A 


7170 


7 ISO 


7200 


7210 


7220 


7230 


1000 1010 1020 1030 1040 1050 1060 1070 

GCACfVTTGTAACATTAGTAGAGCAAAATGCAATGCCACTTTAAAACAGATAGCTAGCAAATTAAGAGAACAA 


GCAmTTGTlWjnTTAGTAEinGCAriAATGGAATGACACTTTAAGACAGATAGTTAGCAAATTAAAAGAACAA 

7240 7250 7260 7270 7280 7290 7300 

1080 1030 1100 1110 1120 1130 1140 

TTTGGAAATAATAAAP.CAATAATCTTTAAGCAATCCTCAGGAGGGGACCCAGAAATTGTAACGCACAGTTTT 
I ; I J ! ! ! ! ! I 1 1 ! r I I ! ! ! ! ! I I ! *ttiriii*****««**i*i**»** 1 *** 1 * 

TTT-p^pKqypiAAACAATAGTCTTTAATCAATCCTCAGGAGGGGACCCAGAAATTGTAATGCACAGTTTT 

7310 7370 7330 7340 7350 7360 7370 

1150 11GO 1170 1180 1190 1200 1210 

AATTGTGGAGGGGAATTTTTCTACTGTAATTCAACACAACTGTTTAATAGTACTTGG-TTTAATAGTACT 


l l l t l l l l 


■ l t l t i i t > i l i 


|| | | | | t t t t I t ! t I I I t I 


I I I I I t I t 


AATTGTGGAGG^a AATTTTTCTACTGTAATACATCACCACTGTTTAATAGTACTTGGAATGGTAATAATACT 
73SO 7390 7400 7410 7420 7430 7440 


1720 l230 1240 1250 1260 1270 1280 

TOS-pig rACTGAAC:GGTCAAATAACACTGAAGGAAGTGACACAATCACACTCCCATGCAGAATAAAACAA 


TGGAATAATACTACAGGGTCAAATAACAAT- 
7450 7460 7470 


—ATCACACTTCAATGCAAAATAAAACAA 
7480 7430 7500 


1290 1300 1310 1320 1330 1340 1350 

rTTATAAAGATGTGGCAGGQAGTAGGftftAAGCAATGTATGCCCCTCCCATCAGCGGACAAATTAGATGTTCA 


i i i i i i i t i i i i i i i i i i i i i i i i > i i ■ 1 1 i i > 

i • i t i i t i t i t i i i i i t i i t t i i i i i i ■ i < > > > 


ATTATAAACATGTGGCAGGAAGTAGGAAAAGCAATGTATGCCCCTCCCATTGAAGGACAAATTAGATGTTCA 
751 0 7520 7530 7540 7550 7560 7570 


1360 1370 1380 1390 1400 1410 

TGAAATATTACAGGGCT6CTATTAACAAGAGATGGTGGTAATAACA-AC-AATGGGTCCGAGATCTTC 

: : : : : : : : 

TCAAATAT fACAGGGC f ACT ATT AACAAGAGATGGTGGT AAGGACACGGACACGAACGACACCGAGATCTTC 
7580 7590 7600 7610 7620 7630 7640 

1420 1430 1440 1450 1460 1470 1480 1490 

AG ACCTGGAES At a'G AGATA'l 6i AGGG AC AATTGGAG AAGTGAATT AT AT AAAT AT AAAGT AGT A AA AATTGAA 


t i t » i i i i i i 


i i i i i i i i i 1 i t 


AGACCTGGAGGAGGA6ATATGAG6GACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTAACAATTGAA 
7650 7660 7670 7680 7690 7700 7710 7720 

1500 1510 1520 1530 1540 1550 1560 

CCATT AGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGTGGGAATAGGA 

.. ..... • • i i i t t i i i i 


i i i t t i i t 


i i t i t i i i i i t t f i i i i i 


i i i i i i i i 


CCATTAGGAGTAGCACCCACCAAGl-.iCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGCG-ATAGGA 

7730 7740 7750 7760 7770 7780 


1570 1580 1590 1600 1610 1620 1630 

GCTTTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCACGGTCAATGACGCTGACGGTACAG 


i i i i i i i i 


GCTCTGTTCCTTGGGTTCTTAGGAGCAGCAGGAAGCACTATGGGCGCAGCGTCAGTGACGCTGACGGTACAG 

7790 7800 7810 7820 7830 7840 7850 7860 

1640 1650 1660 1670 1680 1690 1700 

GCCAGACAATTATTGTCTGGTATAGTGCAGCAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAACAGCAT 


i i t t i t i i i i i ■ > i i i t i 


ill i i t i i i i i i i i t i 


GCCAGACTATTATTGTCTGGTATAGTGCAACAGCAGAACAATTTGCTGAGGGCCATTGAGGCGCAACAGCAT 

7870 7880 7890 7900 7910 7920 7930 

1710 1720 1730 1740 1750 1760 1770 

CT6TTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAATCCTGGCTGTGGAAAGATACCTAAAG 


i t i i i i i i i i i i i 


ATGTTGCAACTCACAG - 


GGCATC AAGCAGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGATACCTAAAG 





7340 


7350 


7360 


7970 


7980 


7990 


8000 


1780 1730 1800 1810 1820 1830 1840 1850 

Ci^TCP^CPiGCTCCTEnKGPTTTGuGGTT^CTCTQGFlRFiRCTCATTTGCACCPiCTGCTGTGCCTTQGARTQCT 

, I I !! i ! I 1 I ! I I ! I I I ■ • : : : ' * • ' * ' ' ' ' 1 ‘ 1 ' 

ef=>TnftftCf=^t3CTCCTS0iC5STTTT®3eQTT(2CTCTQQAaAACTCATTTSCftCCACTACTSTQCCTTQQAATQCT 

8010 ~ 8020 8030 8040 8050 8060 8070 

I860 1870 1880 1830 1900 1910 1920 

AGTTGGAGTAATAAAICTCTGeAACAGATTTGGAATAACATGACCTGGATGGAGTGGGACAGAGAAATTAAC 


AGTTGGAGTAATAAATCTCTGGATGATATTTGGAATAACATGACCTGGATGCAGTGGGAAAGAGAAATTGAC 

8080 SOSO S100 3110 8120 8130 8140 

1330 1940 1950 1960 1970 1980 1990 

AATTACACAAGCTTAATACATTCCTTAATTGAAGAATCGCAAAACCAGCAAGAAAAGAATGAACAAGAATTA 

.lilt ill 

, , , , . 1 i i i * t • > i * t i i i j * * > | | j ; j ! I | I ! ! i i i iii i i i i i i i i i . i i i i • ' 

AATTACACAP.GCTTAAfATACTCATTACTAGAAAAATCGCAAACCCAACAAGAAAAGAATGAACAAGAATTA 

8150 8160 8170 3180 8190 8200 8210 8220 

2000 2010 2020 2030 2040 2050 2060 

TTGG AATTAGA’l 'AAATGGGCA AGTTTGTGGAATTGGTTT AACAT AACA AATTGGCTGTGGTATAT AAAAAT A 

t i i i i i t i ' t i i i ‘ « » i • J i i j J J j | J j J J J | j [ | j j [ , , , , , , , , , , , , , i i i i i i i i i i i > * i i » » * » 

TTGGAATTGGATAAATGGGCAAGTTTGTGGAATTGGTTTGACATAACAAATTGGCTGTGGTATATAAAAATA 
3230 3240 3250 3260 8270 8280 8290 

2070 2030 2090 2100 2110 2120 2130 

TTCfiTAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTGCTGTACTTTCTATAGTGAATAGAGTT 

., , , , , i i » i t i i i i i t i i i i < » * * ' • ‘ 1 1 ' ' ' ' ' 1 ' ' 1 ' ' ' ' ' 1 ‘ 1 ! ! 


t i i i i t i i i i i i i i * « * * « * < * 1 

t i i i i i i t t * i i i • i i < < ■ *■* i 


TTCATAATGATAGTAGGAGGCTTGGTARGTTTAAGAATAGTTTTTGCTGTACTTTCTATAGTGAATAGAGTT 
3300 8310 8320 8330 8340 8350 8360 

2140 2150 216C 2170 2180 2190 2200 2210 

AGGCAGGGATATTCACCATTATCGTTTCAGACCCACCTCCCAACCCCGAGGGGACCCGACAGGCCCGAAGGA 

; ; ; : : . .. . 

AGGCAGGGATACTCACCATTGTCGTTGCAGACCCGCCCCCCAGTTCCGAGGGGACCCGACAGGCCCGAAGGA 

8370 8380 8390 8400 8410 8420 8430 

2220 2230 2240 2250 2260 2270 2280 

ATAGAAGAAGAARGTGGAGAGAGAGACAGAGACAGATCCATTCGATTAGTGAACGGATCCTTAGCACTTATC 

.. .1.1 1 1 I I I I I I I I I I I l I I I I I I I I I I I I 1 

I I I I I I I I I I I I I 


ATCfiiAAGAAGAAGGTGGAGAGAGAGACAGAGACACATCCGGTCGATTAGTGCATGGATTCTTAGCAATTATC 

8440 8450 8460 8470 8480 8490 8500 

2290 2300 2310 2320 2330 2340 2350 

TGGGACGATCTGCGGAGCCTTGTGCCTCTTCAGCTACCACCGCTTGAGAGACTTACTCTTGATTGTAACGAG 

. i i i .it i ! : , i . i i i j iii ;; !l !!!!!!!!!! I ! ! ! ; !!!!!!!!!!!!! ! ! ! I ! I 1 ! ! ! 

TGGGTCGACCTGCGGAGCC—TGTTOCTCTTCAGCTACCACCAC-AGAGACTTACTCTTGATTGCAGCGAG 

8510 8520 8530 8540 8550 8560 8570 

2360 2370 2380 2390 2400 2410 2420 

GATTGTGGAACTTCTGGGACGCAGGGGGTGGGAAGCCCTCAAATATTGGTGGAATCTCCTACAGTATTGGAG 

I;..,.;;;,.;;.;;;;;;;;:;;;;;;;;;;;; I I I I I I 1 1 : I i i i : : : : : : : : : : : : ; : : : : : : : : : : 

GATTGTGGAACTTCTGGGACGCAOiGGGGTGGGAAGTCCTCAAATATTGGTGGAATCTCCTACAGTATTGGAG 

8580 8590 S600 8610 3620 8630 8640 


2430 2440 

TCAGGAACTAAAG 

i i i t i i i ' > i * ' 1 

i i i t i i i t » i i t i 

TCAGGAACTAAAG 
8650 8660 X 
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genomic mRNA 
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) 3373 

tat* rev* nef subgenomic mRNA 


I VS 
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tat* rev* nef subgenomic mRNA intron 1 


I VS 
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1 174 

a 

c 809 g 805 t 


ORIGIN 
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Score 
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o. oo 

194 
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X 10 20 30 40 50 60 70 

ATGAGAGTGAAGGAGAAATATCAGCACTTGTGGAGATGGGGGTGGAAATGGGGCACCATGCTCCTTGGGATA 


t i ( i 


ATGAGAGCSAAGG-GEATCAGGAAC-iAATTGT-CAG-CACTTGTGGAGATGGGGCACCATGCTCCTTGGAATG 
X 430 500 510 520 530 540 550 


80 90 100 110 120 130 140 

TTGATGATCTGTAGTCr'CTACAGAAAAATTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAGGAAGCA 


TTGATGATCTGT AGTGCTGCAGCP.AACTTGTGGGTCACAGTCT ATT ATGGGGT ACCTGTGTGGAAAGAAGCA 
560 570 580 590 600 610 620 


150 ISO 170 ISO 190 200 210 

ACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACAGAGGTACATAATGTTTGGGCCACACAT 


i i i i i < i i i i i i i 


> i ■ i t t i t i 


i i i i i t t i i i i c i i i 


ACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACAGAGGCACATAATGTTTGGGCCACACAT 
G30 640 650 660 670 680 690 


220 230 240 250 260 270 280 

GCCTGTGT ACCC ■ ACAGACCCCAACCCACAABAAGTAGTATTGGTAAATGTGACAGAAAATTTTAACATGTGG 


! I I ? t I > t I I I 1 t I I t t t 


I I I I 





290 300 31.0 320 330 340 350 360 

Aft< a, ft .ATGACATEeTAGAACAGATG:CATGAGGATATAATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTA 


; ; , .. . * i i i * . i t i t t i * i t i i » i » « * • • i i * t i .... 

AftjqpjATA RC ATlIGTAG ARC AGATGC AT6: AGS ATATA ATCAGCTT ATGGG ATCAAAGCCT AA AGCC ATGTGT A 
770 7 BO '750 300 BIO 820 830 

370 3S0 330 400 410 420 430 

P(APf[ T AACCCCACTCH GTGTTAGTTTAA AGTGCACTGATTTGGGGAATGCT ACT AAT ACCAAT ACT A GT A 
III ! I ! I I ! J I I I I ! ! ! ! ! ! i ! ! J ! I 1 » I I ! I > I ! > > * * * * * 1 11 1,1 1 * ' 1 1 1 1 1 1 

AAACTAACCCCACTCTGTGTTAC7TTAftATTGCACTGATTTGAATACTAATAATACTACTAATACTACTGAA 
3.0 Q50 860 870 880 830 900 910 

440 450 460 470 480 490 500 

ATACCAATAGTAGTAGCGGG.GAAATGATGATGGAGAAAGGAGAGATAAAAAACTGCTCTTTCAATATCAGCA 

! ! II!!! ! ! * I . . .. * * ........ • • ■ ...... .. 

CTATCAATAATAGTAG7TTGGGAACAACG—GGGTAAAGGAGAAATGAGAAACTGTTCTTTCAATATCACCA 
920 930 340 350 360 370 980 

510 520 530 540 550 560 

CAA9NATAAGA0GTAAG6T0CAGAAAGAATATGCATTTTTTTATAAACTTGATATAATACCAATAG- 

: : ; ; 

CAA0CATAAGAGATAAGGTGCAGAGAGAATATGCATTGTTTTATAAACTTGATGTAGAACCAATAGATGATA 
9S0 1000 1010 1020 1030 1040 1050 


570 580 590 600 610 620 630 

ATAATGATAOTACCAGC-TATACGTTGACAAGTTGTAACACCTCAGTCATTACACAGGCCTGTC 

, , - . i • i • i : t t • . iiii i . . i t ii ... i » t . i . » i » i i i i i i i . * * . i 

t , ; ; i i i i t t i i i i ttii ...it .1 I... i . . i i i i i i . . . i i . . . i i » 

ATAAAAAT ACT ACCAACAACACCAAAT A fAGGTTGAT AAATTGT AACACCTCAGTCATT ACACAGGCCTGTC 
1060 1070 1080 1090 1100 1110 1120 

640 650 660 670 680 690 700 

CAAAGGT ATCC1 rTGAGCCAATTCCCATACATTATTGTGCCCCGGCTGGTTTTGCGATTCTAAAATGTAATA 

( , , , , , , f , t t I I I I ! 1 » I < I I I I I 1 I 1 » * * * I * I • I 1 lilt. t t t I I I I I t I I I I t I I I .till 
t , ; t i . t t i t t i t t ) I I t t . i I i I t I . t ' * * * * * I * i • • i i . i t . t I i i i . I . iiill.l I . t i l 

CAAAGGT ATCCTTTGAGCCAATTCCCATACATTATTGTACCCCGACTGGTTTTGCACTTCTAAAGTGTAACG 
1130 11.40 1150 1160 1170 1180 1190 

710 720 730 740 750 760 770 

ATAAGACGTTCAATGGAACAGGACCATGTACAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAG 

ATAAGAAGTTCAATGGGACAGGACCATGTACAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAG 
1200 1210 1220 1230 1240 1250 1260 

780 790 800 810 820 830 840 

TAGTATCAACTCAACTGCTGTTGAATGGCAGTCTAGCAGAAGAAGAGGTAGTAATTAGATCTGCCAATTTCA 


i i i : i t t i i i i t 


t ■ i i i i . t i t i i i i i . i 


... i t i i t 


TAGTGTCAACTCAACT GCTGTT AAATGGCAGTCT AGCAGAAGAAGAGGT AGT AATT AGATCTGAAAATTTCA 

1270 1280 1230 1300 1310 1320 1330 1340 

850 360 870 880 890 900 910 

CAGACAATGCTAAAACCATAATAGTACAGCTGAACCAATCTGTAGAAATTAATTGTACAAGACCCAACAACA 


l i i . t . t 


i . i i t . . i i i . l 


CGAACAATGCTAAAACCATAATAGTACAGCTGAATGTATCTGTAGAAATTAATTGTACAAGACCCAACAACC 

1350 1360 1370 1380 1390 1400 1410 

920 330 940 950 960 970 980 

AT AC AAGAAAAAGT ATCCGTATCCAG A.GGGGACCAGGGAGAGCATTTGTT ACAAT AGGAAAAA-T AGGAA 

: : : : : 

ATACAAGAAAAAG-GGTAAC--GCTAGGACCAGGGAGAGTATGGTATACAACAGGAGAAATACTAGGAA 

1420 1430 1440 1450 1460 1470 

990 5.000 105.C 1020 1030 1040 1050 1060 

ATATGAGACAAGiCACATTGTAACATTAGTAGAGCAAAATGCAATGCCACTTTAAAACAGATAGCTAGCAAAT 

* t i * t i • i t i i i i i t i t . t i : i t i i . i » r • t » » iti. ii* i i . . i i . iiiit.ii.fi. t i 

AT^l TAn rAGArnrArftATRr^AATAACAGTTTACAACArSATAGnTAnAACrT 




1480 


1 Wi 


1500 


1510 


1520 


1530 


1540 


1550 


1070 1080 1090 11OO 1110 1120 1130 

TAAGABAACAATTTGGAAATAATRAAACAATAATCTTTAAGCAATCCTCAGGAGGGGACCCAGAAATTGTAA 

... , i , , , , , i i i t t i i i i i • i t i t i t t i > t t i i i i t t i i t i i > t i i t i i i ■ 

i l l t t l i l i i ' ■ l l I I l i l l l i l l l t t i t i i t t l t i t i i i i i i l i I i i l i I l i i i l l t l l i l t t i 

TAA6AGAACAA7 TTGG-GAATAAAACAATAGCCTTTAATCAATCCTCAGGAGGGGACCCAQAAATTGTAA 

1560 1570 1580 1530 1600 1610 1620 

1140 1150 1160 1170 1180 1190 1200 

CGC AC AGTTTT AATTETlnG AGGGG AR TTTTTCTRCTGTAATTCAACACAACTGTTTARTAGTACTTG-G 

I ! I i t r * 1 ! i i i i i t t » i i I i i t i i i 1 i t i i i I i i i i i i I < i t * » I t 1 i I i i i i i t i i i ) I i i t 

TGCACAGTTTTRnTTGTGGAGGGEiRATTTTTCTACTGTAATTCAACACAGCTGTTTAATAGCGCTTGGAATG 
1630 1640 1630 1660 1670 1680 1630 

1730 1220 1230 1240 1250 1260 

TT-TARTAGTACTTGGAG-TACT-GAAGGGTCAAATA-ACACTGAAGGAAGTGACACAATCACACT 


I I i t I i 


t ill I 


ill i l i i l t 


l l l I t t l I 


TTRCTRGT AAT6GT ROTTGGRGTC iTT RCT RGRRR6—CRRRRRGRCRCTG-GRGRCRTTRTCRCRCT 

1700 1710 1720 1730 1740 1730 

1270 1280 1290 1300 1310 1320 1330 

CCCATGCABAAT RRRRCRRTTTATAAACATGTGGCRGGAAGT RGGRRRRGCRRTGTRTGCCCCTCCCRTCRG 

I i I ! [ I I I I ! I I ! I * ! I I \ ! I I I i I i \ i ! I I I I I i ! i t I i i i i i i i i i i i i i i i i i i i i i i i i i i i 

CCCRTGCRGRRT RRRRGARRTTRTRRRCRGGTGGCRGGTTGT RGGRRRRGCRRTGTRTGCCCTTCCCRTCRR 

1760 1770 1780 1790 1800 1810 1820 

1340 1350 1.360 1370 1380 1390 1400 

CGGRCAAATTAGATGTTCATCAAAT RTTRCRGGGCTGCT RTT RRCRRGRGRTGGTGGT RAT RRCRRCRRTGG 

14 11 | | I , I t 1 . t » I I I 1 1 1 I I I I I 1 I I I I I * • I * 1 I 1 I I * I I * > I I I 1 I I I I I 1 I 1 I III I 

till I t I t I ) t I t l I I I t I I I I I I I I I 1 I I I I I I t I I » 1 I • I > I.I I I I I t I 1 III I 

AGGACTAATTAGATSTTCATCAARTATTACAGGGCTGCTATTAACAAGAGATGGTGGTGGTGAGAACCAGAC 

1830 1840 1850 1860 1870 1880 1890 1900 

1410 1420 1430 1440 1450 1460 1470 

GTCCGAGA i'CTTGAG ACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATRRRTATRAAGT 

,,11)1,4,4 I I I I I t t I 1 I I I I I l I I I I I I I I I I I I 1 I * I * <1 I * * • * • * * * * * » ' * ' 1 ' 1 ' 1 1 < » ' ' 
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CACCRAGATCTTTAGACCTGGAGGAGGAGATRTGRGGGRCARTTGGRGRRGTGRRTTATATAAATATARRGT 

1910 1920 1930 1340 1950 1960 1970 

1480 1.490 1500 1510 1520 1530 1540 1550 

AGTAAAAATT6AACCRTTAGGRG7AGCACCCACCARGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGC 

,,•,11111 | > ) I I I I I I I t 1 t ! I I I 1 I 1 1 I I I t t I I I I I t t t t I I t I I I I I I I I I I I I I 1 I I I I I I I » t t 
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AGTAAAAATCGAACCATTAGGAGTAGCACCCACCARGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGC 

1980 1930 2000 2010 2020 2030 2040 


1560 1570 1580 1590 1600 1610 

AGTGGGAA-TAGGAGCTTTGTTGCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCACGGTCAAT 

I t ! i i t i t i i i t i i i i t i i i t i i i i i i i i i i i i i i t t i i i t i i i i i i i i i i i i i i I t i i « i i I i I i 

AGTGGGAATGCTAGGAGCTATGT7CCT7GGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAACGTCAAT 
2030 2060 2070 2080 2090 2100 2110 


1620 5.630 1640 1.650 1660 1670 1680 1690 

GAC6CTGAC6GTACAGGCCAGACAATTATTGTCTGGTATAGTGCAGCAGCAGAACAATTTGCTGAGGGCTAT 


t l ! t I t I t 


t , I t I I I I I I t t I I I I 1 1 t t I I I I t I I l l t l l l l t I t I I I I I I I I l 


GGCGCTGACGGT ACAGGCCAGACAATTATTGTCTGGTATAGTGCAACAGCAAAACAATTTGCTGAGAGCTAT 

2120 2130 2140 2150 2160 2170 2180 


1700 1710 1720 1730 1740 1750 1760 

TGAGGCGCAACAGCA1CTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAATCCTGGCTGT 

4 , | t I , I I I I I t I I I I 1 I I I I I t I l t t I I I I l > t I I I I I I I I I I I I I I I I I I I I I I * I I ‘ • » I* 1 t II I I 1 

TAACGCGCRRCRGCATCTGTTGCARCTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAATCCTGGCTGT 

2190 8200 2210 2220 2230 2240 2250 2260 


1770 1790 1790 1.800 1310 1820 1830 

GGAAAGATACGTAAAnGATCAACAGGTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTGC 

! ! ! ! I ! ! ! I ! I ] ! J ! ’• I I ! ! ! I • ' I I I I ! ’ ’ > i 

GGAAAGA1ACCT! ARAGGATGARCAGCTnCTAGGGTTTTGGGGTTGGTCTGGARAACTCRTTTGCACCRGTGC 







2270 


2280 


2300 


2310 


2320 


2330 


1840 


1870 


1880 


1890 


1900 


TGTG:nCTTGGAPTGCTAGTTeGAGTAATAP.ATCTCTGGAAC^GATTTGSAATAACATSACCTQ(3ATGGABTB 


* 1111*11 * *.* * • * . . * < < • * * • » * • ■ * * < • » < ■ 

TGTGCCTTBGPnTeCTABTTGGAGTAATAAAACTCTSGATCAGATTTGSAATAACATSACCTGGATGSASTS 
2340 2350 2360 2370 2380 2390 2400 

1910 1920 1930 1940 1350 1960 1970 

SGACAGAGAAATT-AACAATTACACAABC.TTAATACATTCCTTAATTGAASAATCSCAAAACCASCAASAAAA 

; ; ! ! ! ! ; I ! i ! ! < '< '> < '< < ‘ ! i > > ■ ■ • ■ > * • ■ *•»***■ . .. • • * ■ • • 

GGACAGAGAAATrEiACAATTACACACACTTAATATACACTTTAATTGAAGAATCGCAAAACCAACAAGAAAA 
2410 2420 2430 2440 2450 2460 2470 

1980 1990 2000 2010 2020 2030 2040 2050 

GRR7 GftACAAGAATTATTGGAATT RGAT AAATGGGCAAGTTTGTGGAATTGGTTT RRCRT RRCARRTTGGCT 

ilti till :i* i • i i i i t ; i t i i i i t i i i i i i i i i i i i t < * i i i i i i ■ i i i t i i i i t i i i i i i 

ill, till i*i i i i i i i t i - i : i i i i i i i i t i i t i i i i i i i » i i i i i i i i i i i ( i ... 

GAATCAACA.UnGAR.CTATTGCAATTAGATAAGTGGGCAAGTTTGTGGACTTGGTCTGACATAACAAAATGGCT 
2480 24GO 2500 2510 2520 2530 2540 

2060 2070 2080 2090 2100 2110 2120 

GTGG TAT AT ARARAT'ATTCAT AATGAT P.GT AGGAGGCTTGGT AGGTTT AAGAAT AGTTTTTGCTGT ACTTTC 

| , , ; , , | | | | J 1 | I 1 | I I I I 1 | | I 1 , I I I I I I t I I I 1 t I I ..I I 1 I I I t I I I I I I I I • I I 1 t I I I 
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GTGGl'ATAr ARARAT A TTCAT AA’I GAT AGTAGGAGGCTTGAT AGGTTT AAGAAT AGTTTTTGCTGTGCTTTC 
2550 2560 2570 2580 2590 2600 2610 2620 

2130 2140 2150 2160 2170 2180 2190 

TATRGiTGAATAC AGTt AEGCAGGGATAT TCACCATTATCGTTTCAGACCCACCTCCCAACCCCGAGGGGACC 

, , , , i i i , i i i , i i ) , i i i ( , i i , i i i t i t i • .. i t i i t i i i iiiiiiii .. 

i , , , i i i i , i j i i i i t i i i i i i t i t : i i i i i i t t i i i i i , i t i i i t i , , i i i , , i i i i i i i 1 , f i t i 

T AT RGTGRAT AGAGTT AGGCAGGRAT ACTCACGATT ATCGTTTCAGACCCTCCTCCCAAACCCGAGGGGACC 
2630 2640 2650 2660 2670 2680 2690 

22C0 2210 2220 2230 2240 2250 2260 

CGACAGGCCCGAAGGA.ATAGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCCATTCGATTAGTGAACGG 

t i i : t : , i 1 l t i t i t i i iii'ii: i i t i i i t t i i i i i I , , i i I i i * i , ■ i i i i t i I i i I I I i ■ i 


i t i i ■ * i 


CGACAGGCCCGAAGGAACCGAAGAAGGAGeTGGAGAGAGAGGCAGAGACGGATCCACTCGATTAGTGCATGG 
2700 2710 2720 2730 2740 2750 2760 

2270 22S0 2290 2300 231O 2320 2330 

A7CCTT AGCftCn TATCTGGGACGATCTGCGGAGCCTTGTGCCTCTTCAGCT ACCACCGCTTGAGAGACTT AC 

t t I i i i f I i i l t i , i , i i t ; t i , i , t I i I l l i till).I , , i l l i l l l I l l l l t I l l l ,,, l I 
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CTTCTTA6CACTTGT CTGGGAC6ATCTGCGGAGCC—TGTGCCTCTTCAGCTACCACCGCTTGAGAGACTTAC 
2770 2780 2790 2800 2810 2820 2830 

2340 2550 2360 2370 2380 2390 2400 2410 

TCTTGATTGTAACGAGGATTGTGGAACTTCTGGGACGCAGGGGGTGGGAAGCCCTCAAATATTGGTGGAATC 


i i * i i i : i 


i i t t i i i > i 


TCTTGAT! GTAGCGAtiGATTGTGGiAACTTCTGGGACGCAGGGGGTGGGAAGTCCTCAAATATTGGTGGAATC 
2840 2850 2860 2870 2880 2890 2900 

2420 2430 2440 

TCCTACAGTATTG1GASTCAGGAA0TAAAG 

lilt i i , i i i , i i : * i t i i » i i i i i i i I 

TCCTGCAG' T ATTG6AGTCAGGAACT AAAG 
2910 2920 2930 X 


4. KUNZ- 153- C.L33, SEE 

HIVF'V.22 Human immunodeficiency virus type 1» isolate PV22* 


LOCUS HT.VPV2.2 3770 bp ss-RNP. VRL 15-JUN-1989 

DEFINITION Human j mmunodef iciency virus type 1. isolate PV22, complete genome 
< H9/HTL.V-111 proviral DMA). 
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This sequence -'-or a H3/HTLV—III virus was determined from one 
complete provirai clone t i3. Additionally? several CDNA clones of 
the viral' ENA were sequenced for comparison with the entire 
©roviral sequence. The differences between cdna and provirai DNA 
sre extensive and are listed in the Sites Table as variations. The 
authors believe that the variations may be due in part to different 
strains in the H9/HTLV-III cell line? because it was established by 
infect ion with material from several AIDS patients. 

With the addition of g at 2111? gag cds and pol cds are very close 
to those of HXG2» BRU? and related HIV viruses. 
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pep t 

7SS3 

2337 

gag polyprotein precursor 


pept \ 

2094 

5141 

pol polyprotein <NH2™-terminus uncertain; 66 
2034) 

pept 

5086 

5864 

vif protein 


pep c 
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5840 

vpr protein 


pen: •t 
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tat protein? exon 2 (first expressed 
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842* 
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tat protein? exon 3 (AA at 8422) 


pept 

SO 1 5 
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rev protein? exon 2 (first expressed 
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rev protein? exon 3 (AA at 8423) 


pept 
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pept 
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tat cds intron 2 
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rpt 
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R repeat 5’ copy 
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R repeat 3’ copy 


b i rid i ng 
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395 

Spl binding site III 


bAnding 

397 

406 

Spl binding site II 


binding 

406 

417 

Spl binding site I 


binding 

S45 
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primer (Lys-tRNA) binding site 


vet a ant. 

510 

510 
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variant 
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g in provirus; a in cDNA [13 


revision 

2111 

211 ;? 

gg in C23; g in Cl 3 


vaniant 

571S 

571 6 

g i n prov i rus; a i n cDNA E13 


var1ant 

5992 

5392 

a in provirus; gin cdna cl3 


variant 

6007 

6007 

c in provirus; t in CDNA Ell 


vs'i A ent 

8047 

SO 4 7 

c in prov1rus; g in CDNA C13 


va i * x ant. 

G051 

605 1 

c in prov1rus ? am cdna e13 


variant 

8055 

6037 

ac,q i n prov i rus; gaa 1 n cdna C l 3 



va?" i am? 

6 lOD 

6108 

va i iant 

G 120 

6120 

veriant 

G1 25 

6126 

var i ant 

5136 

6136 

v/ar i ant 

G233 

6235 

variant 

6352 

6352 

var i ant. 

G7G0 

6760 

var ant 

7090 

TOGO 

var iant 

7100 

7100 

ve t :i. ant. 

7134 

7135 

v/ar ' i ant 

71 S3 

7184 

vav ■ J. ant 

7193 

7199 

variant 

7284 

7235 

variant 

7303 

7303 

v/ar 1 ant. 

751 1 

751 1 

v/ar iant 

7533 

7533 

var1 ant 

75 S 6 

7586 

variant 

7643 

7648 

v/ar i ant 

3133 

8139 

v/ar iant 

3143 

3143 

var i ant. 

8222 

8222 

variant. 

3263 

3269 

v/ar i ant. 

8285 

8285 

vat' i ant 

3376 

8376 

v/ar I ant. 

3381 

83 S 1 

vat iant 

3476 

347G 

variant 

0866 

8863 

variant 

3979 

£973 

var i aril? 

8990 

8390 

vat i ant 

8595 

oocq 

v/ar iant 

9031 

S031 

var i ant 

9291 

9291 

var i anl? 

9295 

S235 

wav i ant 

9305 

□303 

v/ar i ant 

9548 

9548 

3 i gt iB 1 

3654 

36 53 

prov/ 

10 

3761 

cel 1 

1 

5 

ce 1 1 

9762 

9770 

BASE COUNT 

3435 

B 1786 

ORIGIN 

432 bp i 

upstream i 


t in provirus? c in cDNA [13 
a in provirus? c in cDNA [13 
ac i n orov i rus ? gtaac i n cDNA 


a in provirus; 
t in provirus5 
g rn provirus; 


m CDNA [13 
in cDNA [13 
1n CDNA [13 


t in provirus 5 a in cDNA [13 
c in provirus; tin cDNA [13 
a in provirus? g in cDNA [13 
ca in provirus ? ac in cDNA [1 
gt in provirus ? aa in cDNA [1 
a in provirus ? g in cDNA [13 
aa in provirus ? gc in cDNA [1 
a in provirus; c in cDNA C13 
a in provirus [13? c in cDNA 
t in provirus C13? a in cDNA 
c in provirus [13? t in cDNA 
a in provirus [13? g in cDNA 
a in provirus ? c in cDNA [13 
t in provirus? c in cDNA [13 
a in CDNA [13 
[13? g in cDNA 


a in provlrus LlJ 
a in provirus? c 
t in provirus? c 
g in provirus; a 
a xn provirus [13 
g in provirus [13 
a in provirus [13 
a in provirus [13 
a in provirus [13 
a in provirus [13 
c in provirus? t 
a in prov/irus; c 
c in provirus [13 
a in provirus [13 


; t. in cDNA 
; g in cDNA 
; g in cDNA 
? g in cDNA 
? g in cDNA 
in cDNA [13 
in cDNA C13 
; a in cDNA 
? g in cDNA 
? g in cDNA 
? t in cDNA 
; a in cDNA 
? c in cDNA 


Initial Score 
Residue Identity - 

Gaps 


t in provirus tu? g in coin 

g in prov/irus C13? t in cDN 

g in provirus C13? a in cDN 

g in prov/irus [13? c in cDN 

mRNA polyadenylation signal 
HIV-1 proviral DNA 
human cellular DNA 
human cellular DNA 
2376 g 2172 t 
aa 111 site. 


1S77 Optimized Score ■= 2180 Significance = 0.00 

SS% Matches = 2247 Mismatches = 146 

.106 Conservative Substitutions “ 0 


X 0 20 30 40 50 60 

ATGAUAGFi i~A~AGGAGAA— ATATCAGCACTTGTGGAGA—TGGGGGTGGAAATGGGGCAC—CATGCTCCTTGG 
; ; ; ; ; ; j ; ; j ; ; ; III! !!!!!! 111!! ! ! ! ! ! ! ! ! ! 

CTAA/rAGttAi'-.i3iAGCAC--AAGACAGTGGCAAT-GAGAGTGAAGGAGAAATATCAGCACTTGTGGAGATGGG 

6240 0,250 6260 6270 6280 6290 6300 

70 SO 90 10O llO 120 

GATATTG A' i 6-AT ~CT -i5:T AGTGCT AC AG AA AA ATTGT-GGGTC ACAG-TCTATTATGGGGTAC— 

; ; ; ; ; : : ; ; : : ; : : : ; : ; : : : : : : 
GGTCGAGA'fSGCGCACCATaCTCCTTGGGATGTTEATGATCTGTAGTGCTACAGAAAAATTGTGGGTCACAG 
631O S320 8330 6340 6350 6360 6370 


6330 


130 140 S 50 160 170 180 

-CT_GTGTESAA-EGAASCAA—CCACCA—CTCTATTTTGTGCATCAGATGCTAAAGCATATGAT 

;: :! ii : : : i i ; : :; : : 

TCTATTATOEGGTACCTGTCJTEGAAGGAAGCAACCACCACTCTATTTTG-TGC-ATCAGATGCT 
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* ! ! 1 ! * | \ * ' * * ! ! \ r . t 1 ■ • ' ' . I • I I . ! i r t » I t I 1 I 1 i I I i i I i i t t 1 t i I I i i t I I I t I i I I I I I 

[ -A _:f\. iPGA. O -74, : :i OI. : TOO I I 'I-'O I'CTGTCAA FTT C ACGGAC AATGCT A A A ACC AT AAT AGT ACftGGTG AAG 












1.300 


1310 


1320 


1330 


1340 


1230 
330 


330 ;-oO 310 920 930 940 950 

ITSTASAA.A ’ rAATTOff AC.AAGACCCAAlV-'ACAATACAP.GAAAAAGTATCCGTATCCAGAGGGGACCA 


i i I <: i i i t 


1 CTGT.''6P.PATTA:ATTGTACnAGACXXAACAAGAATAC^AGAAAAAAAATCCGTATCCAGA0Ci(3GACCA 

'Vyr-o : -t?o 3. 3YO 1380 1390 1400 1410 

;-»:o 300 PAO 930 1000 1010 1020 

0:3CP'GAtHC*'rrro.vT T AC AAT.At.GP.AAA A r AGGiAAA") ATGAGACAAGCACATTGTAACATTAGTAGAGCAAAA 

j J ! ! ’, ! * ! 1 * ! ! } ! ! ! I ’ I ! ! ! ! ’ I I ! ! I I ! * I * J t ! ! 1 ! J t : i i t i » t i i i i > i i i » i » • i i • » i » t » t » 

CiGRP;! 7 7'*77 TV-. 1 n t ' 3 "V H .V 4 .V' f AGEAAAT ATGAGACAAGCACATTGT AACATT AGT AGAGCAAAA 

1420 1 £ 131 i ; V 40 1 4- 5 O 1460 147 0 1480 


1 050 
TGCMA 


;.050 10S0 1070 1080 1090 

33 A-'.i-iCAui.-;TA l .07 (3 7CAA; 77TAAGAGAACAATTTGGAAATAATAAAACAATAATCTTT 


iGiGAA FGO'i 3333 i' FAA AAC AGATAOlCTAGC AAATT AAGAGAACAATTTGGAAAT AAT AAAACAAT AATCTTT 
14CI 0 v 1510 1.520 1.530 1540 1550 1560 

1 l OQ i i i ; i ;i ;>i - l 1 30 1 140 1 150 1 160 1 170 

ITGT AACGC ACAGTTTT AATTGTGGAGGGGAATTTTTCT ACTGT 

, • , , . i , , ! I t I t I 1 I I I I t t I I t l I I I I I < .) 1 ! I I I I I I 1 ( I I I I I I I 

; , . , , i , ; , ; , i , * i t t i t i i t t i t t i t i i i i i i i t i I i i i » » > » i i • * > * • ' 1 * * * * * 1 1 1 1 1 1 1 1 1 1 

WC-:; JvAi-ivuSA; ICCAtFiAAATTGTAACGCACAGTTTTAATTGTGSAGGGGAATTTTTCTACTGT 

iffO 1580 1330 1G00 1610 1620 1630 

i. ir-: r 11 so 1200 1210 1220 1230 1240 

Aj.‘.,TCAAUA l.PA.-JVtn lY-V-Yi r.ACTTGQTTTAATAGTACTTGGAGTACTGAAGGGTCAAATAACACTGAA 


T i 'V? -Of 


orvi f.'CTTnGTTTAATAGTACTTGGAGTACTGAAGGGTCAAATAACACTGAA 
1SS0 1670 1680 1690 1700 


1.250 ? .000 1270 1280 1290 1300 1310 

i.tC ,ri> vT3p,Cf IHTCPC ACTCCCP' iECA6AATAAAACAATTTATAAACATGTGGC-AGGAAGTAGGAAAAGCA 


t t t i t t i t i i i i i i t t t i i i i i 

i i i t i i i i i i i ■ t i t i i i i i i i 


GfrPPCfrGACAC.PfffCPCACTr.CCATGr.CAQAATAAAACAATTTATAAACATGTGGCAGGAAGTAGGAAAAGCA 
1710 1720 7; 30 1740 1750 1760 1770 

1 'HO 1340 1350 1360 1370 1380 

ATG'i r-VnsCi 1 j'J’t i :'Y: CAlvTlC-iACPAATTAGPTGT fCATCAAATATTACAGGGCTGCTATTAACAAGAGAT 


t i • t ( t i t t t i i i t i i i i t i t 


t i i i i i t t i i i i i 


Vtlo 

■; ^ ■ ' {■ 1 

300 

18 5.0 

1820 

1830 

1840 

* 

1 ■ -GO 

' pi. ;i o 

1420 

1430 

1440 

1450 


AA f. 4-7 ii 7 1JA. Vi GlO -.7 L 


V ;nvG7-.TU I 7CAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGT 


llr G 7. 7 4 f AA i V A C V vVL*7V' *T icj ’ f COG AG A' fCTTCAGACCTGGAGGAGGAGAT ATGAGGGACAATTGGAGAAGT 
1850 \ZBC U/f'O 5.880 1890 1900 1910 1920 

1430 ' ■ 77 > 1 4. '.l G 430 1.500 1510 1520 1530 

RftA i I'ATA i 'AAA {“ ft TA.A AGT AGT A A A AATTG A ACCATT AGG AGT AGC ACCC ACC A AGGCAA AG AGA AGAGTG 
J J I : ) J ! ; 1 J I 1 ! ! ' ! ' J I 1 1 1 ; J I ! ! ! 1 ! I i ! ! I I I I I i i i i i i i t i i i i i i » i t < t i * i i i i i * t i i * t 

J3AA1 rATATPOA IPAAGT 37 T APiPAATTGAPCCATTAGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTG 


i : i 4 ) 

’. 940 

1 SCO 

1360 

1970 

1980 

1990 

1 54\ ■ 

1550 

1560 

1570 

1580 

1590 

1600 


gts: f ',oAt:,' ago-'.gtggeaataegagctttgttccttgggttcttgggagcagcaggaagcact 

t t t i .. t » I I.I I I I • t t 1 I t I I I I I I I 1 t I t t t t » 

, t » I ) j i i r f : ! F ■ : * r i r I f t i ! : i J I t I i t i I i I I I i I I ... t ... I I I t t t I I I l t ! I I I 

f-'7t; YVTAGPCPiPAPPP.GAGCAtViTGGGrAP.TAGGlAGCTTTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACT 
VXfOO 1.0 7020 2030 2040 2050 2060 


A 71-.'..: -17C. ,1. 


1230 1640 1650 1660 1670 

AC1 v.U T'u ACGGT ACAGGCCAGACPlATT ATTGTCTGGT AT AGTGCAGCAGCAGAAC 






■>70 


20:30 

2100 

21 lO 

2120 

2130 

■ • .0 

j irVV. : 

1 '700 

1710 

1720 

1730 

1740 


AA7T l GCTGAGGGC7 ATTGAGGCGCAAUAGCAfCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAG 
| * * J * \ J | ! ! ! \ \ ! ! I 1 I ! \ I \ \ I I 1 ! J I I I I ! I ! ! I ! ! I i i « i i i » * i » i i i i * 1 1 1 1 * 1 * * 1 1 * 1 ’ * ’ * 

TaC'I'ijifttit'.iijiJTF'.TT^fU'siartiCftACAGiCftTC.TGTTGCAACTCACAGTCTGSGQCftTCAftQCAQCTCCAQ 
i t , pr-.o v 1 BO 2170 2 ISO 2180 2200 


17 'jf' lYI'O r?’70 1780 .1780 1800 1310 

GCPiAGAAT i'‘wTi.:;r:';f;Tl;;Ti36mP.ri( a lTP.CCT<=iftf=lCaGftTCAACAGCTCCTSGGK3ftTTT®3QQTTGCTCT6QAAAft 
! j ! ' ' | * t J | ! ! ! I ! ! ! ! ! ! ! ! ! ! ! ! * * ' I ! * ’ t I ! * J ! ! i » • * < * • ■ 1 1 * • * 1 * * 1 * 1 1 1 1 1 * 1 * 1 1 1 1 1 1 1 
':T;RR^RR''P :CTi: ! -:MCl'G : 'THcnR;*\^r5!^l''PiCC;TAAPlOG3PlTCPlftCftGCTCCTGQGGATTTGGG(3TTGCTCTeeAAAA 

22*J s 2220 2220 2240 2250 


2260 


2270 


2280 

1820 lo30 1 . 8*10 1850 i860 1870 1880 1860 

CTCP;rTT8iCftr:c;ACl-8:CTG76CO*r r6£RRTGCTRSTT6GROTRRTRRRTCTCTGGRRCR6RTTTGGiRRTRRC 

| * ! I t ! I ! ! ! ! ! ! I I I ! ! ! ! ! ' I ! I ! I t » * • » t i i i * i i i t i t * * * « » * » ' » ' > 1 * > • > * 1 * • 1 * 1 1 11 

CTPnTTTrtfJfia^nCTGOTGTGCCTTGGAATGCTAGTTGGAGTAATAAATCTCTGGAACAGATTTGGAATCAC 

• ^300 2310 2320 2330 2340 2350 


1 ! r .:0‘ ‘ 1 810 1820 1330 1340 1350 1360 

ATGAl JC T SiCi AT5GAGTGGGACPG AGP.AATT AACAATT ACACAAGCTT APT ACATTCCTT AATTGAAGAATCG 

; \ I \ 1 1 ! ! ! ! ! ! ! ! ! ! i ! I * i ! ! i I » I * i > ! * t t i » * • * 1 • > * * • • 1 . .* i ***** *. . 

RGOR* 'fSTCiORTC^ORO I ^TiS6ROR8iRt*nRRRTTRRORRT TRCRCRRGGTTRRTRCRGTCCTTRRTTGARGRATGC3 
2JE0 23YO 2350 2330 2400 2410 2420 


iy7(5 1.8;'JO 1S0O 2000 2010 2020 2030 

Op/^^cAGCAAGAAAAEAATGAACAAGAATTATTGGAATTAGATAAATGGGCAAGTTTGTGGAATTGGTTT 


I I t I I 


r<»PttACCr2~CAP6AAAAGAATGAACAAeAATTATTGSAATTAGATAAATGGGCAAGTTTGTGGAATTGGTTT 

2430 2440 2450 2460 2470 2480 2430 


2040 2050 2060 2070 2080 2030 2100 

AP.r P, fAACAAA f 1'GGC fGTGGTATA T AAAAATATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATA 
: i : 

A AC A' f AAC:-; A AT i 'GG'JTGTGG'TAT ATAAAATTATTCATAATGATAGTAGGAGGCTTGGTAGGTTT AAGA AT A 
25 Oo 25 3'.') 2520 2530 2540 2550 2560 


oii't 2 170 -130 2140 2150 2160 2170 

G i ! ! TTGC T GfiACTi YCTATAGTGAATAGAGTTAGGCAGGGATATTCACCATTATCGTTTCAGACCCACCTC 

t , , . , ; 1 I 1 I t I t 1 ! 1 I I ; I t 1 : t ! I : t 1 t I I I I I I I t 1 i I I t I I < * I I * I t I ... 

, , , j r I I , f : ; : r i T T : .. 

GT7 T TT6CTGTAOT7 TCTGTAGTGAAT AGAGTT AGGCAGGGAT ATTCACCATT ATCGTTTCAGACCCACCTC 
2570 25?O 2590 2600 2610 2620 2630 2640 

2180 2130 2200 2210 2220 2230 2240 2250 

CCAACCCCGA3G20ACCCGACAGGCCCGAAGGAATAGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCC 
J | ! ! J J I 1 * ! I * ■ J • : i i i t i i t i » * t i i i i t i t t i i i i i i i i > » < * < < > * * * » • * * * * * 1 1 1 1 1 1 * * 1 1 
CCAATCCCGAGGiiGACCCGACPGGCCCGAAGGAATAGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCC 
2630 2660 2670 2680 2630 2700 2710 


2260 7270 2280 2230 2300 2310 2320 

AT7 1 JC.ATTAG'rGAACGGATCCTTAGCACTTATCTGGGACGATCTGCGGAGCCTTGTGCCTCTTCAGCTACCA 


i i i t i t t i 


i l i t i i t i i i 


ATTGGATT AGT i :AACGUATCCTT P.bsCACTT ATCTGGGACGATCTGCGGAGCC—TGTGCCTCTTCAGCT ACC A 
2720 7730 7740 2750 2760 2770 2780 


7330 2340 2350 2360 2370 2380 2330 

CCGi OTGACAGP:-Tl A5TC7'"GATTGTAACGA'iGATTGTGGAACTTCTGGGACGCAGGGGGTGGGAAGCCCT 

! ! I I ! I I ! ! ! !!!!!! ! ! ! ! I ! ! i I ! ! ! I ! ! ! 1 ■ 1 • 1 1 • 1 ..... ' 

CQTiCTTGAfaAGACTTACTCTTGATTGTAACGAGGATTGTGGAACTTCTGGGACGCAGGGGGTGGGAAGCCCT 
2780 2300 231O 2820 2830 2840 2850 


2400 ' : ;4 i O 2420 2430 X 

CAAATATTGGTGGAA’l CTCU f'ACAbiTAT TGGAGTCAGGAACTAAAG 

! J ! I I ! ! ! ! J ! i I ! ! ! * ! i ! ! > i * * < * * < 

^V--T<°iTTi'v : r-'-‘:T:*4- .if --OT CTCCTRC-RiRTATTGGAiaTCACaGiAGCTAAAGa 
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Human immursdoficiency virus type 1» NY5/BRU <LAV- 

\/\\ i 3703 bo ss-RNA VRL 15—JUN—1989 


jf iclency virus type 1 (HIV—1), NY5/BRU (LAV— 1) 
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FEATURES 

p£pt 

i-'' c ■■ ° 

pep t 

pept 


L CC. **’• A V'’ A i I 13 

< Liases i to OS) 

9urk 1 ctr »C. E- » Buckler-White. A. J. , Wi 1 ley.R. L. and McCoy, J. 

1 J nputo its hed < 13(38 >. 

•f u l1 s t ■ i rf__ _r o v i ow 

2 (1 to UTOa) 

Adeohl ,A, 5 Usvids 1 man,H. IE, » Koenig.S. , Folks .T. . Wi 1 ley.R. , 

Rc,b'Bor;,A, and Martin, M. A. 

Product ion of acquired immunodeficiency syndrome-associated 
retrovivus in humeri and nonhuwan cells transfected with an 
nfect x ous rno 1 ecu ler cl one 
Vi"o\. 53, 284-291 < 1986) 
i- u 11 s ta f f _ rev-' i ew 

3 < s ?. tws! rev-’ 1 s i ons of C1 ] > 

Buck 1 er - C. E, 

Unpub 1 i sl-.od < 1388) 
f u 3.3. s ta f f _ r av i aw 

Clean copy of sequence 1.1] kindly provided by Chuck Buckler, NIAID, 
t.! it sde ; * MD , 24- - JUN— 1908. The construction of pNL4-3 has been 

described in L'21, pNL4-3 is a recombinant (infectious) proviral 
clone that contains DNA frovn HIV isolates NY5 (5’ half) and BRU <3’ 


he If): 


The sits of recombination is the EcoRI site at positions 


pept 


The length and sequence of the vpr coding region corresponds to 
thai Of the DRU, SC, SF2, MAL and ELI isolates. The vpr coding 
region of these isolates is about 18 amino acid residues longer 
than the vpr coding region of the Illb isolates. In HIVNL43, this 
shift, is due to a single base deletion (with respect to the Illb’s) 
at position 5770. The sequence at this position is "atttc" in 
HXVNL43 and "attttc" in HIVHXB2. 

The original DRU clone, sequenced by Wain-Hobson, et al. (Cell 40, 
3—17 (1385)), and the BRU portion of the pNL4-3 recombinant clone 
are different clones from the same BRU isolate. 

Two of the revisions reported in the FEATURES produced changes in 
amino acid sequences. The revision at position 2421 changes one 
amine acid residue from ’R’ to ’G’ in the pol coding region. The 
rev/:*', si on at. positions 8935-9000 changes three amino acid residues 
from ’AHT’ to ’VTP’ In the nef coding region, 
from to/span description 

730 2232 gag polyprotein 

< 2085 5038 pol polyprotein <NH2-terminus uncertain; AA at 

2085) 

504i 5613 vif protein 

5553 5843 vpr protein 

5330 6044 tat protein? exon 2 (first expressed exon) 

3365 8414 tat protein? exon 3 (AA at 8370) 

5363 8044 rev protein? exon 2 (first expressed exon)* 

8363 8643 rev protein? exon 3 (PiA at S371) 

6061 6306 vpu protein 

6221 8735 envelope polypratein 

-.'- 7 . 7^7 a7,rv7 nef tn rote in 


730 
< 2085 

5041 
5553 
5330 
3365 
5363 
8363 
6061 
6221 


584 3 
5843 
6044 
8414 
8044 
8643 
6306 







pre-msg 

455 

3626 

genomic mRNA 


jp 'f'vJ 'tfiSCj 

455 

9B26 

tat* rev* nef subgenomic mRNA 

*1 VS 

/■ 1 /I 
f —r ^ 

5776 

tat* rev* nef mRNA intron 

1 

TVS 

6045 

S368 

tat cds intron 2 


I vs 

CSC 45 

0368 

rev cds Intron 2 


I VC 

6045 

03S6 

tat, rev, nef mRNA intron 

2 

LTR 

1 

S3 4 

5’ LTR 


LTR 

13076 

9705 

3’ LTR 


rpt 

454 

550 

R repeat 5’ copy 


rpt 

9529 

9526 

R repeat 3’ copy 


b i rid i ng 

37*7 

3QB 

Sp1 binding site III 


binding 

vi-iC 

357 

Spl binding site II 


bindino 

399 

403 

Spl binding site I 


b i nd x ng 

636 

653 

primer (Lys-tRNA) binding 

site 


site 3743 574S EcoRl site of recombination 

recomb 5743 5744 mv-i isolate NY5 dna end/Hlv-l isolate lav 

DNA start 

revision ?.S2 183 at in C31» tg In El] 

revision 194 1S4 g in E3] » c in til 

revision 2421 2421 g m [3i; a in m 

revision 8335 3000 tcacac m E3]i ctcaca in Ell 

revision 3415 3413 c m C315 a in c 1 3 

signal 9602 3607 mRNA polyadenlyation signal 

BASE COUNT 3421 a 175G C 2366 g 2166 t 

origin 5” terminus nf NY5 LTR 

Initial Score *= 1723 Optimized Score = 2163 Significance = 0.00 

Residue? identity - 8955 Matches » 2226 Mismatches = 179 

Gaps = 04 Conservative Substitutions = O 

X 10 20 30 40 50 60 

AT ti.MGAGTGAAGGAGAMATATCAGCA—CTTGTGGAGATGGGGGTGGAAATGGGGCACCATGCTCCT—TGG—G 

itt ft 1 1 t 1 1 t t til 1 itit 111 i iii 1 1 1 1 it ii 1 1 lit 1 

tit ii 1 i 1 t 1 1 1 tit 1 tilt 111 1 tit t 1 .I 1 1 ill 1 

TTGRTAGAC f XATAGAARGRGCAGAAGACAGTGGCAATGAGAGTGAAGGAGAAGTATCA—GCACTTGTGGAG 
X 6190 6200 6210 6220 6230 6240 6250 

70 80 30 100 HO 120 

ATATTGAT -GATCTGn AG-TGCTAC-AGAAAAATTGTGGGTC—ACAGT—CT-ATTATGGGG 

t I till II t t I I I I I 1 I I t t tit till! I I I I I I I 

It till II t t I I t t 1 t 1 I I t III I t I I I I I ■ I I I I 

ATGGRGGTGGAAATGGGGCACCATRCTGCTTGBGAT ATTGATGATCTGT AGTGCT ACAGAAAAATTGT—GGG 
6260 6270 6280 6290 6300 6310 6320 

130 140 150 160 170 180 190 

TACCTGTGTlJGAA.uGAAGCAACCACCACTCTATTTTG—TGCATCAGATGCTAAAGCATATGATACAGA—GGT 

I lilt II I III » t I lit! II I III !••••• 

I Till 11 t III If I till II I III I I I I I I 

TCACAGTCTATT AT RG.GG-^ACCTGTGTGGAAGGAAGCAACCACCACTCTATTTTGTGCATCAGATGCT 

6330 6340 6350 6360 6370 6380 6390 

200 210 220 230 240 250 

A CATAATGit T —TGGGCCACA-CA1 GCCTGTGTACCCACAGACC—CCAACCCACAAGAAGTAGTATTG 

■ lit tii t t t 111 t t 1 it 1 t 1111 11 t t t t 1 1 ttt 

1 iii 111 , t t it* lit 11 1 1 1 1 1 1 t 1 1 1 t t t t lit 

AAAGCAT—ATGA fACAGAGGTACATAATGTTTB—GGCCAGACATGCCTGTGTACCCAC—AGACCCCAACCCA 
6400 64-10 6420 6430 6440 6450 6460 

260 270 280 230 300 310 320 

GTAAATGTSiACARAAAATTTTA.ACATGTG-GAAAA-ATGACATGGT AGAACAGATG—CATGAGGAT AT A 

t 1 11 11 t 1 t 1 1 1 1 i 1 1 t l 1 l 1 iitt t 1 tii 1 ill ill 1 11 1 

CAAGAAGT AS—TATTGGTRA-ATGT GACAGAAAATTTTAACAT—GTGGAA—AAATGACAT-GGTAGA 

6470 6430 6430 6500 6510 6520 

330 340 350 360 370 380 390 

ATCAG-TT'T 'ATG-EBATCAAAGCCTAAAQCCATGTG-TAAAA—TT AACCCCACTCTGTGTTAGTTT AA—AG 

i tii t t t t >>i> t i ■ t iii t i iii iii iti > i t i t i iiii 

I t : i t tit till tl I t lit I i III ill til I I I I 1 1 till 

A-CAGATGCATGAGEATATRA-TCAGTTTATGGGATCAAAGCCTAAAGCCA-TGTGTAAAATTAACCC 

6590 6540 6550 6560 6570 6580 6530 



400 410 420 4'^W^BPF' 440 450 

TGCA.CTGAT TTGG-C.G AATGCTACTAAT-RCCAATACTAGTAATACCAATAGTAGTAGCGGGGAAATG 

• ; ; ; ; ; ; ; III I ! ! II '< ! ’> ! ! ! : : ; : : : ' ' ’ ' ' ' ' ‘ ’ 1 ' ‘ ' 

CPiCl CTGTFITTAGTTTARAR TCTiC-AGTGATTTGAAGAATBATACTAATACCAATAGTAGTAGCGGGAGAATG 

6GO0 bsio GS20 6630 6640 6650 6660 

460 470 480 490 500 510 520 

ATGATGGRGAAAeCRI /^CiATAAARAACTGCTCTTTCAATATGAGCACAAGNATAAGAGGTAAGGTGCAGAAA 

; ; ; ; ; ; ; ; ; ; ; ; ; ! i : ; i i ; ; i ; ;;;;;;; 

RTRRrGGRGAAAGGAGAGATAAAAAACTGCTCTTTCAATATCAGCACAAGCATAAGAGATAAGGTGCAGAAA 

6670 6680 6630 6700 6710 6720 6730 

530 540 550 5S0 570 580 590 600 

"grrtrtgcrtttttttrtrrrcttgrtrtrrtrccartagatrrtgrtrctrccrgctatacgttgacaagt 

;;;;;;;;;;; : : :: : : : : : 

OAA7ATGCATTCTTTTATAAACT1 GATATAGTACCAATAGAT AR-T RCCAGCT AT AGGTTGAT AAGT 

(4740 6750 8760 6770 6780 6790 

GIG 620 630 640 650 660 670 

TCn Y-.AGACCTGAG' FCATTACACAGGCCTGTGCAAAGGT ATCCTTTGAGCCAATTCCCAT ACATTATTGTGCC 


i i i i i i t t t i t i : i i t t i 


i l i t i i i t i 


TGTAACACCTCAGTCATTACACAbirjCCYGTCCAAAGGTATCCTTTGAGCCAATTCCCATACATTATTGTGCC 

6800 6810 6820 6830 6840 6850 6860 6870 

S30 630 700 710 720 730 740 

CCGGCTGG7TTTGCG: ATTCTAAAATGT A AT AAT AAGACGTTC AATGGA ACAGGACC-ATGT ACAAATGTCAGC 
I:;;::;;::;!:;:::;;;:: i ; i ! i : ! ; ; i ! : : i i : > i ’ '> '• ; ;: ; : •'••■*■■*■■■■■ • ' ' ■ 

CCGGCTGG’I'TTTGCbl A'FTCT AAA ATGTAATA A FA AG ACGTTC AATGGA ACAGGACCATGT ACAAATGTCAGC 

6880 6SS0 6300 6910 6920 6930 6940 

750 -760 770 780 790 800 810 

ACAGTACAATGTACACATGGAATTAGGCCAGTAGTATCAACTCAACTGCTGTTGAATGGCAGTCTAGCAGAA 
: :::::::::::::::::: 
ACACTACAATGTACACATGGAATCASGCCASTAGT ATCAACT CAACT GCTGTTAAATGGCAGTCTAGCAGAA 
6350 6360 6970 6380 6990 7000 7010 

S20 830 S40 850 860 870 880 

GAAFsiAGGTAGTAATTAGATCTGGCAATTTCACAGACAATGCT AAAACCAT AAT AGT ACAGCTGAACCAATCT 

.■ i i i i i t i i lift 


i i i i i i i • i i i i i i i » i i t i t i i i i 

i i i i i i i i < i i i i i i ( t • • t ■ iii) 


GAAGATGTAGTAATTAGATCTGCCAATT'I C ACAG AC A ATGCT AAAACCAT AAT AGT ACAGCTGAAC AC ATCT 

7020 7030 7040 7050 7060 7070 7080 

800 900 91.0 920 930 940 950 960 

GTAf'AAATTAAT FGiTACAAGACCCAACAACAATACAAGAAAAAGTATCCGTATCCAGAGGGGACCAGGGAGA 

.. ... i • i i ■ ■ i i i i i i i i i i 


i t t i t t i i ; ; t i i i i t i i 


i i i i i i i * > * i » * 1 * * 1 


GTAC A A ATT AA'i FG.TAC A AGf 1CCC A ACAACA AT ACAAGAAAAAGT ATCCGT ATCCAGAGGGGACCAGGGAGA 
7090 7100 7110 7120 7130 7140 7150 

970 980 990 1000 1010 1020 1030 

GCA'1 TTGTTACAATAGCAAAAATAGGAAATATCsAGACAAGCACATTGTAACATTAGTAGAGCAAAATGCAAT 

. . , . . , . . , , , , , , , , , , i . , . , , i i i i i l i i i * * i i i i 1 1 i * 1 1 1 » * 1 ' > 1 ii> 

I J ! ! ': ! 1 1 1 | ! t i i t t r i i : : i i I I I » t t i t i i t i i i i i i I i I I ' I ' 1 1 * 1 ' ' * 1 * 1 1 ' * ' 1 1 1 ' * 1 * • * * 

GCA1TTGTTACAATAGGAAAAATAGGAAATATGAGACAAGCACATTGTAACATTAGTAGAGCAAAATGGAAT 
7160 7170 7180 7190 7200 7210 7220 7230 

1040 1050 1060 1070 1080 1090 1100 
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0030 0040 3050 3060 8070 8080 8030 


1900 1310 1320 1930 1940 1950 1960 

CTGGATGGRGTGGGACAGAGAAATTAACAATTACACAAGCTTRATRCRTTCCTTRRTTGRRGRRTCGCRRRR 


CTGKATGGAGn GGGACAGAGARATTAACAATTACACAAGCTTAATACACTCCTTAATTGAAGAATCGCAAAA 
f? j OO ollO S! 20 0130 8140 8150 8160 







1970 19CC 1990 2000 2010 2020 2030 

CCABCAAGARAACAATGAACR.ABARTTATTGGAATTAGATAAATGGGCAAGTTTGTGGAATTGGTTTAACAT 

* , i ..j ! !;;! I I I !! !!! I ! I I I I J •< 1 * > .. 

GCP.GCR AG A AAAGR ATGA AC ARGRATT R.TTGG A ATT AGAT AAATGGGCAAGTTTGTGGAATTGGTTTAACAT 
8170 8180 8190 8200 8210 8220 8230 

2040 2050 20G0 2070 2080 2030 2100 2110 

AACR.AATTGGCTGTGGTATRfARAAATATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTT 

;:! i i i ■ • i ; i!!;;!:: i::! i > > '<: > * ■ ■ ■ • • 
RRCR.0PiTTi5GCTGTGSTRTR.TRriR.RTT RTTCR.T RRTGRT RGT RGGRGGCTTGGT RGGTTT RRGRRTRGTTTT 
8240 8250 8230 8270 8280 8230 8300 8310 

2120 2130 2140 2150 2160 2170 2180 

TGC ~l GTACTTTC7ATAGTGAAT AGAGTT R.GGCAGGGAT RTTCRCCRTT ATCGTTTCRGRCCCRCCTCCCRRC 
; j i ! i :::;!!!!!!!!!!!!!!!!!! I !!! I ! : : : : : ‘ ' 

TGCTGTACTTTCTATRGTGAATAGAGTTAGGCAGGGATATTCACCATTATCGTTTCAGACCCACCTCCCAAT 

8320 8330 8340 8350 8360 8370 8380 

2530 2200 " 2210 2220 2230 2240 2250 

CCCGAGGGGACCCGACAGGCCCGAAGGAATAGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCCATTCG 

. . . . t i t i i i i i t » t i » i * i ? i i i ( * * > * * * * * * 1 1 * 1 1 1 ' 1 1 ! ! 1 ! ! ! ! ! ! ! ! ! t ! ! ! I ! ! ! I I ! I i i i i i 


CCCGRGGGGRCCCGACRGGCCCGRAGGR! 
8330 8400 841O 


8420 


8440 


8450 


2260 2270 2280 2230 

ATTAGTGAACGGATCCTT AGOACTTATCTGGGRCG 


2300 


2310 


2320 


i t i i t t t i 


lit.. < 


i i t i i t i t i i i i i i i ■ ■ i i 

i t i | r i i i i r t i i i i t i i i 


ATTAGTGAACG5ATCCTTAGCR.CTTA I CTGGGACGATCTGCGGAGCC-TGTGCCTCTTCAGCTACCACCGCT 
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HT.VHXB2CG Human i mmunodef iciency virus type 1 < HXB2) > comp 1 e 

LOCUS HIVMXB2CG 3718 bp ss—RNA VRL 25-SEP-1387 

DEFINITION Human immunodeficiency virus type 1 (HXB2). complete genome; 

HIVI/HTLV-III/LAV reference genome. 

ACCESSION K03455 

KEYWORDS TAF! protein; acquired immune deficiency syndrome? complete genome 
env gene? gag gene? long terminal repeat; pol gene; polyprotein; 
proviruss reverse transcriptase; trans—activator. 

SOURCE HTLV-III/LAV (iso1ate HXB2) provira1 DNA. 

ORGANISM Human immuTiOdef iciency virus type 1 

Viridae, ss—RNA enveloped viruses; Retroviridae; Lent 1 virinae. 
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Expression of the art gene protein of human T—lymphotropic virus 
tyoa t ii (HTLV-III/LAV) 1n bacter1a 

J. Virol. 61 s S33—637 ( 1987) 
full staff_revlew 

31 (sites; inducible enhancer element) _ 



AUTHORS 

Mabel.a an 

■ i Ba 111 more * D. 

TITLE 

An inducible transcription factor activates expression of human 


i.niiMunodoficiency virus in T ce 11s 

JOURNAL 

IMS furs 32S , 

711-713 

(1307) 

STANDARD 

full staff__ 

rev > i ew 


REFERENCE 

32 (bases 5611 to 5 

6111 revises C 4 3) 

AUTHORS 

Ratner v L= 



JOURNAL 

Unpub1 1 ?3hed 

(1387) 

Washington U Med School) St. Louis. MO 

STANDARD 

fun staffs 

rev1ew 


REFERENCE 

33 <sites; 

lone; terminal repeat) 

AUTHORS 

Pat 0 rca*R. 5 

Heath s C 

. , Go 1 denberg»G. J. , Rosen.C. A. » Sodrosk 1 . J. G. . 


Ha .?e 1 1 1 ne, W A, and 

Hansen »U. M. 

TITLE 

Transcription directed by the HIV long terminal repeat, in vitro 

JOURNAL 

AIDS Res, Hum. Retroviruses 3? 41-55 (1387) 

STANDARD 

full staff 

review 


REFERENCE 

34 (aites? 

F: or f) 


AUTHORS 

Wong—Staa1« 

F» , Chanda. P„ K. and Ghrayeb.J. 

TITLE 

Human inmunodeficiency virus! the eighth gene 

JOURNAL 

AIDS Res, Hum. Retroviruses 3, 33-33 (1987) 

STANDARD 

full staffs 

re\/ :i ew 


REFERENCE 

35 (sites» 

sor) 


AUTHORS 

Fisher*A. 6. 

, Enso 1 i 

sS. » Ivanoff.L, » Chamberlain.M. . Petteway.S. . 


Ratner. L. s 

5 a I 1 n , R a 

C. and Wong—Staa 1 .F. 

TITLE 

The sor ger 

is of hiv 

-1 is required for efficient virus transmission 


in vitro 



JOURNAL 

Science 237 

L 888-893 (1387) 

STANDARD 

•full staff 

rev/ i ew 


COMMENT 

Sequence for C33 kindly provided in computer-readable form by 


L. Ratner'* 1 

8—AUG— 1986. 


Th 0 HXB2 sequence i 

s being used as a reference genome for all the 


HIV entries 

: because 

, it has been derived from a demonstrably 


infectious 

c 1 one,, 

Hence not all of the "sites" references above 


were concerned with 

i this isolate. 

FEATURES 

from to/span 

description 

pept 

789 

223.1 

gag po1yprotein 

pspt 

/ 2357 

50S5 

pol polyprotein (NH2-terminus uncertain. AA at 




2357 > 

pept 

5040 

5G18 

sor 23K protein 

pSp t 

5558 

5794 

R (ORF) protein 

pept 

5830 

6044 

tat protein, exon 2 (first expressed exon) 


3378 

8423 

tat protein, exon 3 

pept 

5SSS 

6044 

trs protein, exon 2 (first expressed exon) 


3378 

8652 

trs protein, exon 3 

pept 

8224 

8794 

envelope polyprotein 

pept 

8735 

3167 

27K protein (premature termination) 

mRNA 

455 

3635 

HXB2 genomic mRNA 

pre-msg 

455 

S635 

tat. trs. 27K subgenomic mRNA 

IVS 

8045 

S377 

tat intron 1 

IV? 

8045 

8377 

trs intron 2 

IVS 

5045 

8377 

27K mRNA intron 2 

IVS 

745 

5776 

tat,trs, 27K mRNA intron 1 

IVS 

8045 

d-J i* f 

tat, trs intron 2 

LTF-* 

1 

634 

5’ LTR 

LTR 

9085 

371 e 

3’ LTR 

rpt 

454 

551 

R repeat 5’ copy 

rpt 

9538 

3G35 

R repeat 3’ copy 

binding 

377 

398 

Spl binding site III 

b 1 r;d i rig 

388 

387 

Sp1 binding site II 

binding 

4-Hi ~ ' 

40 G 

Spl binding site I 

b i vid i ng 

/— *r r ■ 

O 

G53 

primer (Lys-tRNA) binding site 

rev i s 1 o'i i 58 i I 

SB 1 1 

g in C323? a in C43 

3igna1 

581 1 

861 6 

HXB2 mRNA polyadenyation signal 

BASE COUNT 

8411 a 

1778 C 

2370 g 2164 t 

OR10TO 

435 bo upsl 

cream of 

PvuIT site? 5’ end of proviral genome. 


Tnit.i.Mi ■= 1 65r-: Hint i m i ?p>H Srnrfl » .21.64_- Q^DD. 
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O 


Residue Ideniity ~ Sd% Matches = 2231 Mismatches 

Gaps == 10G Conservative Substitutions 

X i o 20 30 40 50 60 

ATGAGAGTGA—AGGAGAA—ATATCAGCACTTGTGGAGA-TGGGGGTSGAAATGGGGCAC-CATGCTCCTTGG 

; : ;; i : ; : : i :! : ; ; : ; : ; : : ! ; i 

CTAATAGAAAeif'GCAGAAGACAGTGGCAnT-GAGAGTGAAGGAGAAATATCAGCACTTGTGGAGATGGG 

6200 62 J. O 6220 6230 6240 6250 6260 

70 SO 30 lOO HO 120 

GATA TTGATS-AT-CT-E TAGTGCTACAGAAAAATTGT—GGGTCACAG—TCT ATT ATGGGGT AC 

j * ! ! ! t ! ' ' ! ! ! i ! ! ! I ! ! i i i * * * » • * 1 • • i « > < 

GGT^GAEATESGGCACCATECTCCTTEEGATGTTGATGATCTGTAGTGCTACAGAAAAATTGTGGGTCACAG 

6270 6280 6290 6300 6310 6320 6330 

130 140 150 160 170 180 

—OT-GTCiTGGAA-GGAAGCAA—CCRCCR—CTCTRTTTTGTGCRTCRGRTGCTRRRGCRTRTGRT 

: ; ; ; ; ; ; ;;;;!!! I I !'.!!!!!!!! ! ! ! • i ! ! ! ! ! ! 

TCTRTTRTfvFGG rACCTGTGTGGAAGGAAGCAACCACCACTCTATTTTG-TGC-RTCRGRTGCT 

6340 6350 6360 6370 6380 6390 

;j. 30 200 210 220 230 240 

ft CRGRfaG-TRCRTR-RT—GT TTGGGCCACACATGCCTG—TGTRCCCRCRGR-CCCCRRCCCRC 

; ; ; ; ; ; ; ; : ; \ \ l !!!!!! I ! ! I ! • • ! • * * • * ■ * ■ • * • 

AAPlGC ATATGftl ACAGAGGTACATAATGTTTGGGC—CACA—CRTGCCTGTGTRCCCRCRGRCCCCRRCC-C 
6400 6410 6420 S430 6440 6450 6460 

2*50 260 270 280 290 300 310 

ftftrftAGTAGTRffTGCrfAAATGTGACAGAAAATTTTAACATGTGGAAAA—ATGAC ATGGT AGA ACAGATG—C 

: i:: :: : :::: : ::: : : : : : :::::::::::::::: 

A-CAAGAAGTA-GTA-TTG-GTAART—GTGACA-GAAAATTTTGACRT—GTGGAR—AARTGAC 

6470 64S0 6490 6500 6510 6520 

320 330 340 350 360 370 380 

ATGAGGA7ATAATCAQ- -TTTATG-GGATCAAAGCCTAAAGCCATGTG-TRRRR—TTRRCCCCRCTCTGTGT 

: : : : : :::::::: : : : : :: : : : : : 

ft T GGTAG.~;fY -CAGATGCATGRGGRTRT AR-TCAGTTT ATGGGATC AAAGCCT ARRGCC A-TGTGT 

S53C 6540 6550 6560 6570 6580 

350 400 410 420 430 440 

TAG'l TTAA-AGTGCACTGATTTGG-GGAATGCT ACT ART-ACCA AT ACT AGT ART ACCAAT AGT AGT A 


ill it i 


i i i i i i i t i i i i l i i 


ftftftftTTAACCCCACTCTGTG TTAGTTTAAAGTGC—ACTGATTTGAAGAATGATACTAATACCAATAGTAGTA 

G590 6600 6610 6620 6630 6640 6650 

450 430 470 480 490 500 510 

HCGftKQAAATGAT6ATGGAGAAAGGAGAGATAAAAAACTGCTCTTTCAATATCAGCACAAGNATAAGAGGTA 


t i i i i t i i t i i i 


i i i i i i i ■ i i i i i i i 


GCGGGAGAATGATAATGGAGAAAGGAGAGATAAAAAACTGCTCTTTCAATATCAGCACAAGCATAAGAGGTA 
6660 6670 6680 6630 6700 6710 6720 

520 530 540 550 560 570 580 

AGGTGCAGAAAGAAT A' i GCATTTTTTTAT AAACTTGAT AT AAT ACCAAT AGAT AATGAT ACT ACCAGCT AT A 

;i I!!!!! !l !!!!!!! I !!!!!!!!!!!:!! i: !!!!!!! ! 
AGG’i GCAGAAAGAATATGCRTTTTTTT AT AAACTTGAT AT AAT ACCAAT AGAT AATGAT ACT ACCAGCT AT A 
6730 6740 6750 6760 6770 6780 6790 

530 6CO 610 620 630 640 650 660 

CGTTGACAA6 T TGTA ACACCTCAGTCATTACAGRGGCCTGTCCAAAGGTATCCTTTGAGCCAATTCCCATAC 

: ! : j ; ; ! ; ! ! ! ; I ; I I ! j ! i ! , , , , , i t i i i i r i t i t i t t * i * • ' > ' » » 1 ' • • * * * * • * ' * 1 1 * 1 * * 1 1 1 


GCT TGACAAGTlGTAACACCTCAGTCATTACACAGGCCTGTCCAAAGGTATCCTTTGAGCCAATTCCCATAC 
6800 SBIO 6820 6830 6840 6850 6860 

670 630 690 700 710 720 730 

m1' Y rTGTGCCCi ^GECTRGTTTTGCGATTCTAAAATGTAATAATAAGACGTTCAATGGAAGAGGACCATGTA 





ATTATTGTGCCCCGBCTGGT rTTGCGATTCTAAAATGTAATAATAAGACGTTCAATGGAACAGGACCATGTA 
6870 6SSO SSQO 6900 6910 6920 6930 6940 

740 750 7S0 770 780 790 800 

CPsfiATSTCAGCACAETACAATGTACACATGGAATTAGGCCAGTAGTATCAACTCAACTGCTGTTGAATGGCA 

;;!;;;!!!!!! I !! J !!!!!!! I !!!! 1 I !! I !!!!!!! I ! 1 I ..* * ‘ ' 

(^p^rSXCP^CACAGTACAATGTACACATGGAATTAGGCCAGT AGT ATCAACTCAACTGCTGTT AAATGGCA 
6950 6960 S370 6980 6990 7000 7010 

810 5-320 330 340 850 860 870 

GTCTAGCAGRAGARGAGGTAGTAATTAGATCTGCCAATTTCACAGACAATGCTAAAACCATAATAGTACAGC 

: : 

6TCTAGCRGRAEAASAGGTRGTAATTAGATCTGTCAATTTCACGGACAATGCTAAAACCATAATAGTACAGC 
7020 7030 7040 7050 7060 7070 7080 

800 SSO 900 910 920 930 940 

TGAACCAA fCTSTAGAAATTAATTGTACAAGACCCAACAACAATACAAGAAAAAGTATCCGTATCCAGAGGG 

t t i t t t i t r i i t i t i i t i * i i i < i i t * t t » i i * i i i i « • > < * * < 1 * • * • 1 < * 1 i i » i p t i i i t i i t i i 

TGAACACATC7GTAGAAATTAATTGTACAAGACCCAACAACAATACAAGAAAAAGAATCCGTATCCAGAGAG 
7090 7 100 7110 7120 7130 7140 7150 


950 360 970 380 990 lOOO 1010 1020 

GACCAGGGAGAGCATTTGTTACAATAGGAAAAATAGGAAATATGAGACAAGCACATTGTAACATTAGTAGAG 


i t [ i r t t t t t » ! i i t t 


i t i i i i t i t i t ( i t t i t t i » 


i t i i i t i i > i i i i i t i t i 1 i i t i t i i ( 


GACCAGGGA6AGCATTTGTTACAATAGGAAAAATAGGAAATATGAGACAAGCACATTGTAACATTAGTAGAG 
7160 7170 7180 7190 7200 7210 7220 


1030 1040 1050 1060 1070 1080 1090 

C A A A ATGCA ATuCCACTTT AAA: ACAG AT AGCT AGCA AATT A AGAG A AC AATTTGG AA AT AAT AAA AC AAT A A 

i ( i : i i t tit ititttititttiiiii iiitiiiiiiiiiiiitit 

, , ; , t I t i : i t i 1 t t t : t t t t i i t t 1 i I I i i I t t i I I t t t I ( I I t i i i I I I i i I i i i i t I I I i i i i t 

CAAAATGGAP.TAACACTTT AAAACAGATAGAT AGCAAATT AAGAGAACAATTCGGAAAT AAT AAAACAAT AA 
7230 7240 7250 7250 7270 7280 7290 7300 

1100 1110 1120 1130 1140 1150 1160 

TCTT rAAGCAATCCTCAGGAGGGGACCCAGAAATTGTAACGCACAGTTTTAATTGTGGAGGGGAATTTTTCT 

I t ; ; I t > i i i t t t t : i i t t i t ■ : : > i 1 t t i t I t i > t i i t i t i i i > I 1 I > I > I i t > i > > * > I > * > * I > * > • * t 

i t t j t t t i t i t ; t t p i i j t i t t i i t i t i i i t t i t i t i t i i i i i i i i i t t t t t i i i t i i t i I t i i t i i i t t I t 

TCVTTAA3CAft7CCTCAGGftGGSGACCCAGAAATTGTAACGCACAGTTTTAATTGTGGAGGGGAATTTTTCT 
7310 7320 7330 7340 7350 7360 7370 

1170 11SO 1190 1200 1210 1220 1230 

ACT GTA ATTC A AC AC AACTGTTT A ATAGTACTTGGTTT A AT AGT ACTTGG AGTACTG AAGGGTCA A ATA AC A 

i 1 ; t ( t i i i : i t i i t t t i 1 i i i i i t i t ! t r i t i I ! * i i ! i i i t i « < I < * * * 1 I * * * • * 1 1 1 • 1 1 * 1 1 * • 1 * * 

( , . . , , t t , , . , , , , j , ; , t , , J t i > t i < * i ■ i i i i t i i i i i t i i t i t < t i t i i i i t i i t i i i i t i i i 

ACTGTAAYTCAACACAACTGTTTAATAGTACTTGGTTT AATAGT ACTTGGAGTACTGAAGGGTCAAATAACA 
7380 7330 7400 7410 7420 7430 7440 


1240 1250 1260 

CTGA AEG A( ’iG.TUAC ACA AT CACAC 


i t : i t : « i f 


CTGAAGGAAGTGACACAATCACCCT 
7490 "••400 7470 


1270 1280 1290 1300 

CCCATGCAGAAT AAAACAATTT AT AAACATGTGGCAGGAAGT AGGAA 

t i i i i i t i i l i l i t l i i t l l i l ( i i i l i » i i 1 i i • • » i i i i i p i l 

t i i t i i i i i i i t i ■ i i i i i i t i i i i t t t t i t i i i • t i i i i i i t i i 

CCCATGCAGAAT AAAACAAATTATAAACATGTGGCAGAAAGTAGGAA 
> 7480 7490 7500 7510 


1310 1320 1330 1340 1350 1360 1370 1380 

ARGCAATGYRTSCCCCTCCCATCAGCGGACAAATTAGATGTTCATCAAATATTACAGGGCTGCTATTAACAA 


I I t I t I I i I t t l l i t i I l 


AAGC AATGT ATGCCCCTCCGATCAGTSGACAAATT AGATGTTCATCAA AT ATT ACAGGGCTGCT ATT AACAA 
7520 7530 7540 7550 7560 7570 7580 

1.390 1400 1410 1420 1430 1440 1450 

GAEATGGTGGT AAT A ACAACAATGGGTCCG AGATCTTCAGACCTGGAGGAGGAGAT ATGAGGGACAATTGGA 

! \ ! ! \ ! I ! 1 * ! ! I \ * I !*!!!« ! ! 1 * I I ! ! ! I I I ! ! I I i t >*•<•*!*»** » ..i » 

GAGATGGYGCvTAATAGCAACAATGAGTCCGAGATCTTCAGACTTGGAGGAGGAGATATGAGGGACAATTGGA 
7590 7600 7610 7620 7630 7640 7650 7660 

1450 ..-.2f'0 1430 1490 1500 1510 1520 

GAOCiTuAA fTPTOTAAATATARAG'l AGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAA 








f^pt. [PAP r 7 AT Af ^ AT A T A A AR 'T AGT AAAA ATTGAACCATT AGGAGT AGCACCC ACCAAGGC A AAGAG A A 
7670 7630) 7630 7700 7710 7720 7730 

5 T t0 3. 540 1550 1560 1570 1580 1_>90 

|..~pnTGGTSCAGAf:>!AGAnP.AAP>GACiCAGTGGGAATAGGAGGTTTGTTCCTTGGGTTCTTGGGAGCAGCAGGAA 

, , , , . . » . . . . . * » - * ► • ■ * ; ; ; j ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; 1 ! I i ’, ! ! I ! I ! ! I i ! ! I i * » 1 * * 1 * * • 1 1 

PO pf=TGt:ArvAi XK-- 4 AAAAASAGCAGTGGGAATAGGAGCTTTGTTCCTTGGGTTCTTGGGAGCAGCAGGAA 

" 774 O 7750 7760 7770 7780 7790 7800 

1GOO igiO 1620 1630 1640 1650 1660 

GCAC;TATGGG5GCACGraTCAATl3ACGCTGACGGTACAGGCCAGACAATTATTGTCTGGTATAGTGCAGCAGC 

R pp,'pyGOnC^ r. A^HGTCA A^G ACGCT G ACGGT ACAGGCC AG AC A ATTATTGTCTGGT ATAGTGC AGCAGC 
7S;i.o' 7320 7330 7840 7850 7860 7870 

1670 1660 J.6S0 3.700 1710 1720 1730 1740 

AGAAOAATTT6GTGAGGEGTATTGAGGGGCAACAGCATGTGTTGGAACTCACAGTCTGGGGCATCAAGCAGC 

■ ■ ; ; : : : ’ 
pgp^r.AATTTGO rGAGGGCl ATTGAGGGGCAACAGC;ATCT GTTGCAACTCACAGTCTGGGGCATCAAGCAGC 

7880 7670 7300 7310 7320 7330 7340 


i750 1 760 l 770 

TCG-lGGCAAeOATGCT EGCTGTGGAAAGA 


1780 


1730 


1800 


1810 


tit).. 


ARCA.0'5 A ATCCT AGCTGTGS AAAGATACCTAAAGGATCAACAGCTCCTAGGGATTTGGGGTTGCTCTG 

7350 7360 7370 7380 7330 8000 801O 8020 

- .^20 1330 1840 1830 1860 1870 1880 

ijrppppCTCATTTGCACCACTCCTGTGCCTTGGAATGCTAGTTGGAGTAATAAATCTCTGGAACAGATTTGGA 
J I I ' t I I ! ! I I ! t I I » * I ' ' I 1 ' I > t t t t t t t r 1 1 « 1 i I 1 » I * » 1 1 ' * 1 * f 1 * ' * * ' ’ ’ ' ’ ’ 

[^pprTCATTTi^AGCACTRCTGTGCCTTGGAATGCTAGTTGGAGTAATAAATCTCTGGAACAGATCTGGA 

8030 8040 8050 8060 8070 8080 8030 

1380 UiOO 1.910 1920 1930 1940 1950 

ATAAiC ATGACCTKG AT GO AS i'GGGACAGAGAAATTAACAATTAC ACAAGCTTAAT ACATTCCTTAATTGAAG 

. r : 1 1 ;; 1 i; !l I!!!! I !!!! 1 I ! I ! I ! I 1 1 1 ! I ! I ! ! 1 ! ! 

ATCACACGACCTGSA T GGAGTGGGACAGAGAAATTAACAATTACACAAGCTTAATACACTCCTTAATTGAAG 
8100 81 10 8120 8130 8140 8150 8160 

1960 1970 1980 1330 2000 2010 2020 

AATCGCAAAPCCAGCAAGAAAAGAATGAACAAGAATTATTGGAATTAGATAAATGGGCAAGTTTGTGGAATT 

. . . ... i i i i i i i i i i i »( i i <•• * 1 1 1 1 * 1 1 1 1 1 

*. 

AATCGCAAAACCAGCAAGAAAASAATGAACAAGAATTATTGGAATTAGATAAATGGGCAAGTTTGTGGAATT 

q 1 70 91.80 8130 8200 8210 8220 8230 

2030 2040 2050 2060 2070 2080 2090 2100 

GGTTTAACATAACAAATTGGCTGTGGTATATAAAAATATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAA 

; \ \ ; ; i ; ! [ : ; i : : ; : ; : : ; : : : ! : : : : : : : : : ■' : : > 

GGT'i'TAACATAACAAA'fTGGCTG‘1 GG TATATAAAATTATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAA 
“ ..._ „ 8230 o *?no 


3270 


8280 


8300 


2110 2120 2130 2140 2150 klitoo mrv 

GPirYTPiGTTTT'i 'uCTGTRCTTTCTPiT RGTGFtPiT RGPiGTT ftGGCFiGGGAT RTTCRCCATTATCGTTTCRGPiCCC 
» * * i » * » * * • * * • • * ’ • * * • * * ' ! 1 * I ! I I j ! I I ! I 1 ! ! ! ! I ! ! I ! I ! I * I * • ' i * * * > 1 * * * ■ • • 1 1 1 • * 
j pGTT rTTOiCTOT; 'AGTTTCTAT AGTGAAT AGAGTT AGGCAGGGAT ATTCACCATT ATCGTTTCAGACCC 
8310 8.320 8330 8340 8350 8360 8370 8380 


2150 


2160 


2170 


■7 5 30 2130 2200 2210 2220 2230 2240 

acctcccaacccc.ga.,:gggacccgacaggcccgaaggaatagaagaagaaggtggagagagagacagagaca 


I ( t I I I I I I I I I I I < < < t I 


ACrTCCCA'' v ' r| 7::CGARGGGAr,CCGACAGGCCCGAA6GAATAGAAGAAGAAGGTGGAGAGAGAGACAGAGACA 

^8330 * 3400 8410 8420 8430 8440 8450 


7>;o P'360 2270 2280 2290 2300 2310 

GATJcATTGGATmGT fiAAOTGATCCTTAGCACTTATCTGGGACGATCTGCGGAGCCTTGTGCCTCTTCAGC 






(SPT^ATTCEATTAGTGAAO-sEATCCTTGGCACTTATCTGGGACGATCTGCGGAGCC-TGTGCCTCTTCAGC 
04 BO S4VO 8480 8490 8500 8510 8520 

2.320 2330 2340 2350 2360 2370 2380 

TACCACCGCTTEAGAGACTTACTCTTGATTGTAACGAGGATTGTGGAACTTCTGGGACGCAGGGGGTGGGAA 


1 ! ! \ | t 1 I ( « i • 1 I I I t t t 1 1 t t t 1 1 r 1 1 i 1 i 1 I * t I » I ' 


I ( t t 1 I < t I I I I I I t 


TAC'JAGCGCTTGAGAeACTTACTCTTGATTGTAACGAGGATTGTGGAACTTCTGGGACGCAGGGGGTGGGAA 
3530 3540 3550 S5SO 8570 8580 8580 


2390 2400 2410 2420 2430 2440 

GCCCTCA AATA'i FGGTGGAATCTCCTACAGTATTGGAGTCAGGAACT A AAG 

J I I J ! ,' ! ! ! I I ! ' ! ' I ! ' ! ' ! ! ^ I * 1 ! ! » 1 1 1 1 t 1 1 t i t 1 1 t 1 1 1 t » 1 1 1 1 

gccctgaaatattggtggaatctcctacagtattggagtcaggaactaaag 
□600 661O 8G20 8630 8640 X 
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DEFINITION Human irnmunndsf iciency virus type 1 * isolate SC (3’ end o-f genome). 
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SOURCE Human ivnmurndof iciency virus type 1 (HIV-1), isolate SC, provial 

ENP. 

ORGANISM Humnn immunadef iciency virus type 1 

Viridae" ss-RNA enveloped viruses; Retroviridee; 

Lent i s/ i r i nae, 

REFERENCE 1 (banes 5. to 4273) 

AUTHORS Gurgo, C, » EuCsK -G. > Franchini ,G. , Aldovini,A. , Col lalti *E. , 

Fsrrel 1 ,K. , Wong—Staal ,F, , Gsllo.R. C. and Reitz, M. S. Jr. 

TITLE Envelope sequences o-f two new United States HIV-1 isolates 

JOURNAL Virology 164, 531-536 (1938) 

STANDARD -full stnff_raview 

COMMENT Kindly made available in computer readable form by Marv Reitz, 

M.. C„ L , Bethesda* MD 20*392 U. S. A. This isolate was taken from a 
Cal i forma AIDS patient in 1984. There is an in--frame stop codon at 
position 3212 of- the envelope coding sequence; the nef cds is 
urco-rtsin beyond position 4049, A stop codon, ’taa,’ in—frame with 


t-r 

ne-f 

sequence 

does exist at positions 4224-4226. 


FEATURES 

!■ rorn 

to/span 

description 



1 

330 

vi-f protein (partial; AA at 1) 


po; t 

2‘70 

580 

vpr protein 


00 , 6 't 

“•4 1 

3086 

755 

31.79 

tat protein, exon 2 (-first expressed 
tat protein, exon 3 (AA at 3090) 

exon) 

OFT : i‘. 

630 

3089 

755 

3383 

rev protein, exon 2 (-first expressed 
rev/ protein, exon 3 (AA at 3091) 

exon) 

pept 

77.? 

364 

vpu protein (premature termination) 


pep L ps 

935 

3505 

envelope polyprotein (premature stop 

at 3212) 

pO; .. t 

3507’ 

4276 

nef protein 


pre-msg < 

1 

> 4273 

genomic mRNA 


ore- -msg ( 

1 

> 4273 

tat, rev, nef subgenomic mRNA 


J.VS < 

1 

437 

tat, rev/, nef subgenomic mRNA intron 

1 

I VS 

Vbb 

3088 

tat cds intron 2 


tvs; 

75 s 

3088 

rev cds intron 2 


I vs 

756 

3038 

tat, rev, nef subgenomic mRNA intron 

2 

ltf: 

■.3 / 

> 4273 

3’ LTR 


rpt 

4249 

> 4273 

R repeat 3’ copy 


s x t.s 

7.21 2 

3214 

premature stop (tag) in env cds 


BASE COUNT 
□R113 IN 

1 447 

a 760 

c 1053 g 1013 t 


Initial Score 


1158 Dpi 

:• i rn i zed Score “ 2139 S i gn i f i cance = 

0 . 00 

Res i due I dent x t y 

- = 

87% Matches = 2194 Mismatches = 

203 

Gaps 


97 Cor 

issn/ative Substitutions = 

O 




x :< o 20 30 40 

ATGAGAGTGA-AGGAGAAATATCAGCACT-TGTG—GAGAT GG GG-GTGGAAA 


i l i i i 


ATAAGAGAAAGAGCRlFAGACAGiTGGCAATGAGAGTGAAGGGATCAGGAAGGAATTATCAGCACTTGTGGAGA 
GIO 320 330 340 350 360 370 

GO 30 70 80 30 100 110 120 

TGGGRCACGATGCTCCTTGGRAT ATTGATGATCTGT AGTGCT ACAGAAAAATTGTGGGTCACAGTCT ATTAT 
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HIVMAL 0229 bp ss-FNA VRL 15-JUN-1989 

Human immunodeficiency virus type 1 • isolate MAL , complete genome. 
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□PH" AATGA' f Ad ACCRGCTATACGTTGACAAGTTGTaacacctcagtcattacacaggcctgtccaaaggta 


, t t I t i I : t ( i t t * * t i * i t i i i i i i • 


i I i i t i i i t > 


GA7AG7GA7AA7A67AG77ATAGGC7AA7AAA77G7AA7ACC7GAGTRRTTRCRCRGGCTTGTCCAARGGTR 
G370 6*380 3330 6400 6410 6420 6430 

640 r*50 660 670 680 690 700 710 

7CC777GR^CCRAT7CCCA7ACR7TA7767GCCCCGGC7GG7777GCGA77C7AAAA767AA7AA7AAGACG 

;;;;;;; : i ! i i ! i ? i ; ; ; ; ! ; i ! > * < t « t . . . ...... ...... 

ACCl*77GA7Gr;AA7TGCCA'rAG l A7 TA77G7GCCCCAGC7GG7777GCAA77C7AA AG7G7 AA7GA7 AAGAAG 

0440 G4GO 61460 6470 6480 6480 6.->00 

720 730 740 750 760 770 780 

77CAA76GAAGP,179 AGC A7G TAG A AA7G7GAGC AC AG: 7 ACAA7G7 ACACA7GGAA77AGGCCAG7 AG7A7CA 
: : I ; I ! ; \ \ \ \ \ \ \ \ \ : I ! : 1 : : 1 ! I : ; : : : : : : : : : : : : : : : : : : : : : : : : 

T 7 CAA 7 GGA 6 VJGr ;RR RT RTG1'AA R A A7G7CRG7 AGAG7 ACAA7G7 AC ACA7 GGAA77 A AGCC AG7GG7G7CA 





6510 6520 6530 6540 6550 6560 6570 6580 

7S0 800 810 820 830 840 850 

ACTCAACTGCTGTTGAATGGCAGTCTAGCAGAAGAAGAGGTAGTAATTAGATCTGCCAATTTCACAGACAAT 

r : t i ) t i i i t i • t i t t i i i t t r t i r i t t t i i i t t t i i t it i t i ■ i t t i t t t i i i i i t t t t i t i t t 

ACTCAACTGCTCirrftAATGBCftGTCTAGCAGAAGAAGAGATAATGATTAGATCTGAAAATCTCACAGACAAT 
6590 3GOO 6610 6620 6630 6640 6650 

860 G70 380 890 900 910 920 

GCTAAAACCATAATASTACAGiCTGftACCAATCTGTAGAAATTAATTGTACAAGACCCAACAACAATACAAGA 

I I I t i t I I I I I t I I i t t t I I I ll it I I I I t I I I I I t I I i I t I I I I II 1 I I I I I I i I I I I 

ACT AAAAACAT Af'.TP.QT ACAGCTT AATGAftftCTGT ftACAATT AATTGT ACAAGGCCTGGAAACAAT ACAAGA 
6660 SG7Q 6680 6690 6700 6710 6720 

930 340 350 360 370 980 990 

AAAAGTATCCG;ATCCAEAGGGGftCCAGGGAGAGCATTTGTTACAATAGGAAAAATAGGAAATATGAGACAA 

t ■ t ti : i ti ii t i i i i i iiii i i i i i i iii i i i i i i i i i i i til i 

i i i it i i it ii ■ i i i t i iiii i i i i i ■ iii i i i i i i i iiii tit i 

AGAGGGATAC.ATTTC-RGCCC AGGGCA AGCACT CT AT AC AACAGGG AT AGT AGGAG AT AT AAGA AGA 

6730 6740 6750 6760 6770 6780 6790 

1000 1010 1020 1030 1040 1050 1060 1070 

GCACATTGTAACATTAGTAGAGCAAAATGCAATGCCACTTTAAAACAGATAGCTAGCAAATTAAGAGAACAA 

iii i t i i t i till t t it iiii ii i i i i i i i i i i i i i i i i iii ii ii i i 

lit I i i i i i i;tt ! i t i iiii ll i i t 1 i i i i i i i i i t i i ill ll it i i 

GCATATTGTACTATTAATGAAACAGAATGGGATAAAACTTTACAACAGGTAGCTGTAAAACTAGGA—AGC— 
6800 SB1O 6820 6830 6840 6850 6860 

1080 1090 1100 1110 1120 1130 1140 

TTTGGAAA T AA1AAAACAATAATCTTTAAGCAATCCTCAGGAGGGGACCCAGAAATTGTAACGCACAGTTTT 

>< It I I I I * I I ■ I I I i 1 I t I l 1 I I i i i i l i i I I I I i i i i i i i i i l ill I I t I I t I I I 

I • ll II I It I I I t t t I l l I t I I I I I t 1 1 I I I I I I I I I I I I I I I I I III I I I I I I t I I 

CTTCTTAACAAAACAAAAATAATTTTTAATTCATCCTCAGGAGGGGACCCAGAAATTACAACACACAGTTTT 
6870 6880 6830 6300 6910 6920 6930 

1150 116C 1170 1180 1190 1200 1210 

AATTGTGGAGGGGAAT TTTTCTACTGTAATTCAACACAACTGTTTAATAGTACTTGGTTTAATAGTACTTGG 

iiii'i i i i i t i t t t i i i : t i t i i i i t i i it it i i i i i i i i i i i i i i i i iii till i ii 

• < ■ i i i i i i i i t i i i i t i t i t i i i i i i i i i i i i i i i t i i i i i i i i i i t i iii iiii i ii 

AATTGTAGAGGGGAATTTTTCTACTGTAATACATCAAAACTGTTTAATAGTACATGGCAGAATAAT—GGTGC 
6940 6350 6960 6970 6980 6390 7000 

1220 1230 1240 1250 1260 1270 1280 

AGT ACTGA AGGG'. iCAAAT AAC ACTGA AGGAAGTG AC AC AAT C AC ACT CCC ATGCAGAAT AA AACAATTT AT A 

i til iii iiii til it ii it t i i i i i i i i i i i i i i i i l t i i i i i i i i i i i i i i 

■ iii lit i t i i iii it ii ii t i t i i i i i i i i i i t t t t i i i i i i i i i ■ i i i i i i 

AAGACT—AAG-TAATAGCACAGAGTCAACTGGTAGTATCACACTCCCATGCAGAATAAAACAAATTATA 

7010 7020 7030 7040 7050 7060 7070 

1290 1300 1310 1320 1330 1340 1350 

AACATGTGGCAGGAAGT AGGAAAAGCAATGT ATGCCCCTCCCATCAGCGGACAAATT AGATGTTCATCAAAT 

* i . iii i i i i i i t i t i i i i i i i i i i i i i i i i i i i i iii i i i i i i i t i i i i i t 

II I I 1 1 1 I I I 1 It I I 1 I ! I 1 I I I I 1 I I I I I I 1 1 | | I I 1 I I III 1 I I IIII 1 I I I I I I 

AATATGTGGCAGAAAACAGGAAAAGCTATGTATGCCCCTCCCATCGCAGGAGTCATCAACTGTTTATCAAAT 
7080 7090 7100 7110 7120 7130 7140 

1360 1370 1380 1390 1400 1410 1420 

ATT ACAGGGCTGCTATTAACAAGAGATGGTGGTAATAACA-ACAAT-GGGTC—CGAGATCTT CAGACCT 

i i i i ■ > i i t i i > i i i i i t i i i i i i i i i t i iiii i i t t i t i i i iiii iii i i i i i i 

I I I I I I I I 1 I I I I t I I I 1 I I 1 I I I I I I I I 1 1 IIII I I t I 1 I I 1 I till III 1 I I I I I 

ATT ACAGGGCTGAT ATT AACAAGAGATGGTGGAAAT AGT AGTGACAAT AGTGACAATGAGACCTT AAGACCT 
7150 7160 7170 7180 7190 7200 7210 

1430 1440 1450 1460 1470 1480 1490 

GGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTA 

i i I i i i t < t i i i i i i i > t i i .. ii i i i t i I i i i t i I ... i i i i i i i i i i i i i i i i i i i ii 

GGAGGAGGAGATATGAGGGACAATTGGATAAGTGAATTATATAAATATAAAGTAGTAAGAATTGAACCCCTA 
7220 7230 7240 7250 7260 7270 7280 


1500 1510 1520 1530 1540 1550 1560 

GGAGTAGCACCCACCAAGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCAGTGGGAATAGGAGCTTTG 


GGAGTAGCACCCAr: 


^AGGnAAARAGAftGAGTGGTRGAAAGAGAAAAi 


I I 1 I I I 1 ill I I I I I i 




IKMC 



7290 


7700 


7310 


7320 


7330 


7340 


7350 


1570 1380 1590 1900 1610 1620 1630 1640 

TTCCTTGGGTTC r Tt3ir , l^GCO,GC^9GPil 0 i6CPiCTPiTGGGCGCPiCGGTCP\PiTGACGCTGACGGTPlCAGGCCPiGPi 




I I I I I I I I t I t t 


TTCCTTeSUiTrUTTeiiUiftuCftaCAiuSftPilisCRCtaATOCaQCGCAQCt'iTCACTARCGM^TQACQQTACAQQCCAQA 
7360 7370 7380 7390 7400 7410 7420 7430 

1650 1660 1670 1680 1690 1700 1710 

CPlPiTTATTGTCYGBT AT AGTGCAGCAGCAGAACAATTTGCTGAGGGCT ATTGAGGCGCAACAGCATCTGTTG 
\ \ !tt I I ' t ( • t ! I t t t * I I * f;!i* i i t t t t i i : » i t t i t t i i i t i i i » i t t t t i i i i i i r t t i t i 

CAGTTACTGTCTGGTATAGTGCAACAGCAAAACAATTT6CTGAGGGCTATAGAGGCGCAACAGCATCTGTTG 
7440 7450 7460 7470 74SO 7490 7500 


1720 1730 1740 1750 1760 1770 1780 

CAACTCACAGTCTGGGf-'CATCAAGCAGCTCCAGGCAAGAATCOTGGCTGTGGAAAGATACCTAAAGGATCAA 


( i l i l l J t i i t l t i t I t 


GAACTCAUr.TSTCTGGBiGGATTAAACAGGTCGAGGCAAGAGTCCTGGCTGTGGAAAGATACCTAGAGGATCAA 

7510 7520 7530 7540 7550 7560 7570 


1790 1800 1810 1820 1830 1840 1850 

CAGCTCCTGEGGATTTGiGGGTTGCTCTGGAAAnCTCATTTGCACCACTGCTGTGCCTTGGAATGCTAGTTGG 
; : : : : :::::::::::: :::::::::::: :::::::: 
CGGCTCCTAGGAATGTGGGGTTGCTCTGGAAAACACATTTGCACCACATTTGTGCCTTGGAACTCTAGTTGG 

7580 7590 7600 7610 7620 7630 7640 


1860 1SYO 1830 1390 1800 1910 1920 

A6TAAT AAAT CTCT GbiAACAGATTTGuAATAACATGACCTGGATGGAGTGGGACAGAGAAATT AACAATT AC 


i i t i i ( i t i j t i i r i ( t t t t i i i i ■ i i i i t i i i i i i i i i i 

i i i i i i i t t i i i t i i i t i i i i i f i i i i i i t i t i i i ■ i i i i 


AGTAATAGATG-fGTALvATGftCATTTGGAATAATATGACCTGGATGCAGTGGGAAAAAGAAATTAGCAATTAC 
7650 7670 7870 7680 7990 7700 7710 


1930 1940 1350 

ACAA15CTTAATP.GATTGC.TTAAT' 

It) II I ! I I f I 1 l I I » I 
I » I it lit* t * i : I 1 i 

ACAGGCATAATArACAOCTTAAT' 
7720 7730 7740 


1360 1970 1980 1990 2000 

1 3 AAGAATCGCAAAACCAGCAAGAAAAGAATGAACAAGAATTATTGGAA 

. : , I t I t i t t I I ■ | i i i i i t t I i I t i I I I I I » » i I I I i i I l I I I » 
t i i t l I I l t I i I I I t I I > * I I I * * I » * I I * 1 I I I i I t I I I i * * 

'GAAGAATCGCAAATCCAGCAAGAAAAGAATGAAAAGGAATTATTGGAA 
7750 77S0 7770 7780 7790 


2010 2020 2030 2040 2050 2060 2070 

fTAGPTAAATGC.GCf-lfYGTTT GTG6AATTGGTTT AACAT AACAAATTGGCTGTGGT AT AT AAAAAT ATTCAT A 


lilt till i t t i t i i l i 


TTGG ACA AG’ fG'&l 3GA AHT FTGT 68 A ATT GGTTTAGCAT ATCA AA ATGGCTGTGGTAT AT AAGAAT ATTCAT A 
7300 7310 7820 7830 7840 7850 7860 


3080 2090 2100 2110 2120 2130 2140 

AT6T- PflGTA6R AGO C'^TGGTAGOTTT AAGAATAGTTTTTGCTGTACTTTCTATAGTGAATAGAGTTAGGCAG 


t i i | t I I l I l i I t l lilt i t i l 1 l l l l l l I I l l 


AT ,48. f AGiT < 453 AGiG.CVT AAT AGGTTTA AGAAT AATTTTTGCTGTGCTTTCTTT AGT AAAT AGAGTT AGGCAG 
7870 7880 7890 7900 7910 7920 7930 


2150 7 180 2170 2180 2190 2200 2210 

GGA"! ATTCACGA FTA"i GATTrCAG.ACCCACCTCCCAACCCCGAGGGG-ACCCGACAGGCCCGAAGGAATA 

,i,ii t t i < t t t i i i i i t i i i i t i « t i t t » i t i i i i i i i » t i t t , i i i t i i t t * i i i i i i i i 

i , , i i it**: j i i i i . i : t t t i t i i i » t i i i i i i i i i i i t i i i i i i i t i i i i i i ... 

GGATACTCACCTCTGTCGTTGGAGACCCTCCTCCCAACACCGAGGGGACCACCCGACAGGCCCGAAGGAATA 

7640 7650 7970 7980 7990 8000 

2220 '2230 2.240 2250 2260 2270 2280 

GAAc AAGAAi-OTfGGArnGAGnGACAGACnCAGATCCATTCGATTAGTGAACGGATCCTTAGCACTTATCTGG 
I I ! J ! J I ! I ■ ! i ! ! ’ * ' ! ! I I ! I ' I ! ! ! ! ' I ! ! I I I 1 ! ! i t t » * * « * » i » * • * • * • * * * * * • * * 

GAAGAAGAAlv.'-.TGG.PGAGGAAGGCAGACGCAGATCAATTCGATTGGTGAACGGATTCTCAGCACTTATCTGG 
8010 7070 3030 8040 8050 8060 8070 


2290 7410 2520 2330 2340 2350 

GAGGATG1 1 ■ :7 ; “‘OAlrGC TOT! /COT ' 'TTGAGGTACCAGCGCfTEAGAGAGTT AGTCTTGATTGTAACGAGGAT 


GACGAG*:; tv*W Y-y-'Cr: - j i? i «?rCP f '■T" r nAi3TTACCACCGCTTGAGAGAGTTACTCTTAATTGGAACGAGGAT 









2360 2370 2380 2380 2400 2410 2420 2430 

TOTGGRmC T T CTGGGfX"GC06GGGGTGGGRRGCC C7CARATftTTGGTGGAftTCTCCTRCAGTftTTGGAGTCA 


8150 


8160 


3170 


8130 


8190 


8200 


8210 


8220 


X 

i.^5P ir QCTOOf'AG 

G G A O C T G r > H L;i 
8230 
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Results file Ku nz- i 58 - c 1 33umb 1 . res made by Sheppard on Thu 8 Mar 30 11 « 32«28-PST. 


Query sequence being compared» KUN2-15S-CL.33. SEQ 
Number of sequences searChech 34SO 

Number of scores above cuto-i t j lO 

Results o-f the initial comparison o-f KUNZ--158-CL33. SEQ with* 
Data bank * UEMBt 21, all entries 

1OOOO- 

N 

U 5000- 

M 

B 


n o 


s 

E 

0 

U 

E 

N 

C 

E 

S 


1 OOOvr 


500- 


lOC- 


50- 


1C- 


o- 

■ l 

I l 

SCORE o: 
STDEV O 


; 208 


; 4 :-j ! 


624 832 1040 1248 


1456 


1664 1872 


PARAMETERS 


Simi lari 'c.y >Ti.vTr5.>f Unitary 

M1 swatch pam \ tv 1 

Sap pane.* ty 80 

Gap s i Z‘i perie 11 y 
Cuto-F-f tcors 
Random i “ at i o. * 9 '! • nop 

Initial rcorto sove 
□pt imizvri scorer to «ava 


!<—tuple 

J oining pe na11y 
Uindct./ size 


4 

30 

32 


VA; J> 

143 

0 

20 

20 


Alignments to save 
Display context 


10 

0 


Scores 


Times 


SEARCH STATISTICS 

Mean Median 

40 38 

CPU 

OO S 07 a 2:7 04 


Standard Deviation 
46. 71 

Total Elapsed 
00 = 32 = 23. OO 


Number* o f rev < di *es " 5 1 25938 

Number of s^Muar.c&s searched" 3460 

Number pt sr. tn- above cuto+l-s lO 


The ccor os be lev ere* sorted l ->y initial score. 

Sign if toarer b ca 1 cu 1 a ted based on initial score. 


A 100% 


identical sequence 'to the query sequence was not found. 


The 1 ist. o-i t•>•£:_>v. sccvrss io ; 


In It. Opt. 

Sequence Name Description Length Score Score Slg. Frame 




standard deviations 

above mean **** 



1. 

R&'MT'LVd 

Human r—ca 1 1 l ruGac-art i a type 11 

974S 

1872 2176 

39. 22 

o 

2, 

HI VH3Q7 

Human T-cf ll lymphotropic viru 

9749 

1872 2176 

39. 22 

0 



PS standard deviations 

above mean #*#* 



3, 

HI V!~l. ..IGF 

Hunan lympiladenopathy virus <E 

9176 

1246 1893 

25. 82 

o 



is standard deviations 

above mean **** 



4. 

HJVMA! JZE 

Human 1 ymphadenopethy virus <M 

9229 

916 2041 

18. 75 

0 



>;• *rw£■ 5 standard dev/iat i ons 

above mean *#** 



5. 

HIV2RODX 

Human imm*,inode*iciency virus t 

9671 

306 1185 

5. 69 

0 

B, 

r: ' -:tlv4c 

ST! .V-3 {KTi .V-4) part i a 1 prov i r 

5391 

299 1247 

5. 54 

0 

7„ 

RES IV AX'r 

Simian immunodeficiency virus 

9264 

294 1240 

5. 44 

0 

8. 

RCT V'iYXX 

S1 iii i an i vr.it.unodef i c x ency v i rus 

9646 

290 1240 

5. 35 

0 



4 standard deviations 

above mean **** 



9, 

RF9IV251 

Si:iian Immunodeficiency virus 

1142 

259 565 

4. 69 

0 



3 standard deviations 

above mean **** 



10, 

MJ5127 

Figure !. Structure of the art 

306 

184 298 

3. 08 

0 


The scores below arc sorted by optimized score. 

Significance is calculated based on optimized score. 

A 100% identical sequence to the query sequence was not found. 


The 1 i 

St Of Host 

SCDVSS 

Sequer 

ex- Warm-: 

Loner ij: 

i. 

El 1 '* f LV 

Hi .r„ia*n 

2 0 

HIVH3CH 

Hmm 

3= 

HIVMPlLCF 

Human 

4. 

HIVELICS 

Hunan : 

5. 

RRHTL V/-*C 

GTLV-3 

6. 

REG TV/MX X 

G i v.i i art 

7. 

REF I V/AHH 

51 m i ari 

8. 

HIV2R0HX 

Human 

9. 

RT^IVSbl 

Ei tPian 

10. 

p*- 

0-J 

’Si 

£ 

Fi cure 


I t a 11 1 ou.Ka.aivn i a type II 
r-ceM lymphntrcipic wiru 
i •/npr -.adsneps.thy vi rus < M 


i mnunodof i c i ency v 1 rus 
immunodeficiency virus t 
1 Tirmunooef i c x ency v i rus 
1. Structure of the art 


SEARCH STATIC TICS 


Init. Opt. 

Length Score Score Sig. Frame 


9748 

1872 

2176 

0. 00 

0 

9749 

1872 

2176 

0. 00 

0 

9229 

916 

2041 

0. 00 

0 

9176 

1246 

1893 

0. 00 

0 

5391 

299 

1247 

0. 00 

0 

9646 

290 

1240 

0. OO 

0 

9264 

294 

1240 

0. 00 

0 

9671 

306 

1 185 

0. 00 

0 

1 142 

259 

565 

0. 00 

0 

306 

184 

298 

0. OO 

0 


Scores s 

Mean 

Med i an 

Standard Deviation 


217F 

2177 

0. OO 

Times s 

CPI - 


Total Elapsed 


OO 2 OO 2 OO, OO 


00:00 8 OO. OO 

Number c F 

residues” 

73322 


Number of 

sequel icos apt i m i zac’ * 

10 



1. KUMZ-lhS- TiLiflt SEP 

REHTLV.T Human T-csll leukaemia type III <HTLV-III) provira 

ID Rfc'HTLVd standard? RMA; 9748 BP. 

XX 

AC XCiHSP 

XX- - 







DT 03- -SEP— 3.C*jo t < iivt i cor rect ion) 

DT 01 -SEP -133V ( b'i i correct i on) 

DT or- -At IE- ■ 1 987 < on correct 1 on > 

DT 2P--OCT-iQStJ ( iiiinor mod?, f icot ion) 

DT Ob-NOV-198o < KW added) 

DT 2!-.- -MAR- -1SST-5 ( f i rst enf.-y) 

XX 

DE Hitman i-t.eii leucaemia type m (HTLV-III) provirai genome 

DE v niDS virus for acquired immune deficiency syndrome) 

XX 

KW acquired immune deficiency syndrome; direct repeat; endonuclease; 
KW glycoprotein; inverted repeat; protease; provirus; 

KW reverse transcriptase; terminal repeat, 
xx 

□S Human T- coll leukemia virus type III 

□C v i r i dae ; ss-fna enve 1 oped v 1 ruses ; Retrov i r i dae. 

XX 

RN rl3 (oases 1-3748) 

RA Rotner L„ . Hass it ins W. v F'atarca R. , Livak K. J. » Stare lch B. R. . 

RA Josephs 3. F. 9 Doran E. t-t » Rafalski J. A. » Whitehorn E. A. > 

RA Eaumei-tor k« v Ivanof* L., » Petteway S. R, Jr, , Pearson M. L. , 

RA Lautenhergcr J. A, - Papas T. S. 9 Qhrayeb J. » Chang N. T. > Gallo R.C. > 

RA Wong- 3 taai F, ? 

RT "Complete nucleotide sequence of the AIDS virus. HTLV-III"; 

RL Na. ture 3 3.3 ? 277-234< J. 933), 

XX 


RN [2 3 

RA Hues i rig M, A. . Smith D. H, * Cabradilla C. D. JR. * Benton C. V. . 

RA K.a.sKy L A. , Capon D, J. ; 

RT "Xucieic acid structure and expression of the human AIDS/ 


RT 

1 ywphadenopat3y 

rstrevi 

RL 

Nature 31 

i 3 s450“4 

■58< ISO' 

XX 




FH 

Key 

Frew 

To 

FH 




FT 

XNVREP 

l 

p 

FT 

SITE 

l 

634 

FT 

PRM 

427 

430 

FT 

SITE 

453 

'-3-53 

FT 

OPR 

4-54 

454 

FT 

S I TE 

454 

55 1 

FT 

SITE 

55R 

634 

FT 

XNVREP 

633 

634 

FT 

ST TE 

635 

653 

FT 

CD! 9 

787 

l l G2 

FT 

CHS 

787 

0371 

FT 

CDS 

1 18.3 

7321 

FT 




FT 




FT 




FT 




FT 

QpT 

I 368 

2007 

FT 

RPT 

703 i 

2065 

FT 

CDS 

200 1 

5125 

FT 




FT 




FT 




FT 

RrT 

7 l 73 

2163 

FT 

rh r 

7164 

2 176 

FT 

CDF 

5040 

5848 

FT 




FT 

CPC 

637-'! 

002? 

FT 

CVS 

6375 

8G21 

FT 

SITE 

7‘ T 36 

7787 

FT 

CDS 

7767 

882 1 

FT 





Description 

inverted repeat 

long terminal repeat 

T ATA—box 

U3 region 

cap site 

R region 

Uf> reg i on 

inverted repeat 

tRNA binding site (tRNA-Lys) 

gag p17 

gag precursor polypeptide 
ejag p24 and gag pl5 for 
major capsid protein and for 
put. retroviral nucleic acid 
binding protein (NBP)< re f. 2) 
(boundaries not defined) 
direct repeat 
direct repeat 

pol precursor polypeptides 
put., protease at S’ terminus 
reverse transcriptase 
put. endonuclease at 3’ terminus 
direct repeat 
direct repeat 

SOR short open reading frame 

pot. vestigial env gene 

env-lor precursor polypeptide 

envelope glycoprotein 

put. peptide cleavage site 

put, lor transmembrane 

icrotf 1 1 n_ 



FT 

SITE 

90?Ti 

3103 

poly purine stretch 

FT 

site 

31 13 

3567 

U3 reg1on 

FT 

Rf-T 

3 1 l 8 

3745 

1ong tern1nal repeat 

FT 

S 1 TE 

■ZJt— ’-'I't-i 

SE5 58 

R region 

FT 

SITE 

SS4 1 

3S46 

polyadenylation signal 

FT 

SITE 

3353 

8748 

uc region 

FT 

INVREP 

3747 

3748 

inverted repeat 

XX 

S0 

Sequence 

8745 BP 

? 3431 R p 

1781 c; 2368 154 2168 T? 


Initla). Score 
Residue Identity 
Gaps 


1 S72 Optimized Score = 2ITS Significance = 0.00 

86% Matches => 2243 . M i smatches = 150 

10^ Conservative Substitutions = 0 


X i.O 20 30 40 50 60 

ATGAFiAGTGA-AGSAi-:AA-A'rATCAGCACTTGTGGAGA-TGGGGGTGGAAATGGGGCAC--CATGCTCCTTGG 

: : ; ; : : : : :: : : : : ::: : : : :: 

CTAATAGAAAGAGCAGAAGACAGTGGCAAT-GAGAGTGAAGGAGAAATATCAGCACTTGTGGAGATGGG 

6230 6240 6250 6260 6270 6280 6290 

70 80 SO 1OO llO 120 

GATATTGATG---AT-CT-GirAGTGCTACAGAAAAATTGT-GGGTCACAG-TCTATTATGGGGTAC— 

, . , . , , ; t i t i i r i il i l i i i i iiii t i i i i i t ii 

, , , t l t III! ,|l. il I i l l l i lilt i i I I I I . it 

GeTKGAGftTGGG^CP.CCATeCTCCTTGGePiTGTTeATGATCTGTAGTGCTACAGWAAAATTGTGGGTCACAG 

E.3C0 631 0 6320 6330 6340 6350 6360 

130 140 150 160 170 180 

-CT-STGTGEAA-GGAAGCA A—CCACCA--CTCTATTTTGTGCATCAGATGCTAAAGCAT ATGAT 


TCTATTATGGGGTACCTGTGTGGAAGGAAGCAACCACCACTCTATTTTG- 
6370 6380 6390 6400 6410 


-TGC-ATCAGATGCT 

6420 


190 200 210 220 230 240 

A-CAGA6G-TACA1 A-A"i —GTTTGGGCCACACATGCCTG—TGTACCCACAGA-CCCCAACCCAC 

, t i : i t i i i il i t i i i » i i i iij ill i i i » i i i i i i i 

i i i i ; t i t i i i i tit iii iiii i i i i i i i 

AAAGCATATGATACAGAGGTACATAATGTTTGGGC—CACA—CATGCCTGTGTACCCACAGACCCCAACC-C 
6430 6440 6450 6460 6470 6480 6490 


250 260 270 2S0 290 300 310 

AAGAAGTAlATATTGG iAAATGTGACAGAAAATTTTAACATGTGGAAAA—ATGACATGGTAGAACAGATG-C 

t III I ; ; • : , I 1 | till I III I I 1 I 1 I IIII II Ilf I III I 

J *11 lit! til II 1 till I III i i i i i 1 till It III i III i 

A—CAAGAAGTA-GTA—TT6-GTAAAT-GTGACA-GAAAATTTTAACAT—GTGGAA—AAATGAC 

S500 6510 6520 6530 6540 6550 

320 330 340 350 360 370 380 

ATGAl SEAT AT A A TCAtl -TTTATG-GGAICAAAGCCTAAAGCCATGTG-TAAAA—TTAACCCCACTCTGTGT 

i * i it i : it: i i ; t iiii ii i t ill i t ill ill III i i i i i 

i i i i , , : iii t t i : till it i i til i i lit iii ill i i i i i 

AT-GG"! 'A6A A -CAGATGCATGAGGATATAA-TCAGT7 T ATGGGATC A AAGCCT AAAGCCA TGTGT 

66CC 6570 6580 6590 6600 6610 

3S0 400 410 420 430 440 

TAG7 TTAA—AGO GCACTGAYTTGG-GGAATGCTACTAAT-ACCAATACTAGTAATACCAATAGTAGTA 

! | ' J ! J • ' * * * t i tit til il t ill il i i i t i i i » i i t i i i i i i 

TTAACCCCACTCTGTGT fAGTTTAA AGTGC-ACTGATTTGAAGAATGATACTAATACCAATAGTAGTA 

6620 6630 6640 6650 6660 6670 6680 

450 460 470 480 490 500 510 

GCGGGGAAATGATGATGGAGAAAGGAGAGATAAAAAACTGCTCTTTCAATATCAGCACAAGNATAAGAGGTA 

, , , | t i i t i i t i , i i , t i i j i i i » i i i i i t i i i t i i • .. i i i i i i t i i i i i i i i i i i i i i t 

I ! * , t , ( r ; | , l 1 1 t I | 1 1 1 I I I t I l I 1 I 1 I 1 1 I 1 I I I 1 I '• 1 1 1 * 1 > * 1 * 1 1 1 ' I t I I I I I I 1 1 

GC6C iGAGAATG ATAATGGAGAAAGGAGAGATAAAAAACTGCTCTTTCAATATCAGCACAAGCATAAGAGGTA 
6660 6700 6710 6720 6730 6740 6750 

520 530 340 550 560 570 580 

AGGTGCAGAAAGAATfYfGCATTTT'! TTATAAA1fTEATATAAT ACCAAT AGAT AATGAT ACTACCAGCT AT A 

; ; ! ! ; ! ; ! ! I ’ ; ! i I ! ! ! ! ! ! ! ! ! : ! ! ! ! ! ! ! ! 1 ! I ! ! ! I ! ! ! I ! ! ! ! ! ! ! ! i ! i ! • . i . i . i • i i i ■ i ■ i • 

Af»niTf’*i~: AG AAAP’AA*rATnCATTTTTTTATAAAt ITTYvATATAAT ACCAATAGATAATGATACTAGCAGCT AT A 







G7B0 


6790 


6800 


6810 


6820 


590 600 610 620 630 640 650 660 

06T': O^COPiBTTBTR^CftCCTCPiGTCATTPiCPiCAGGCCTGTCCAPiPiGGTRTCCTTTGAGCCAATTCCCPtTAC 
I I I I !!!!!■!!!!! I ! ! I !»•’!(!! I ! I * i !< i • i <*>«** » < *' * • * * 1,1 1 * 1 1 * 1 ' 1 1 1 1 * ’ 1 ' 1 

CGTTGACftftGTTGTAAOACCTCftBTCATTACACAGeiCCTGTCCAAAGGTATCXTrTTGfW^CAATTCCCATAC 
SC 30 68-’O 6850 6860 6870 B8SQ 6830 

670 830 S90 700 710 720 730 

G'lT TftT'DiTGCCCCGGCTGGTTTT 603 ATTCT AAAATGT AftT 6 AT AAGACGTTCAATGGAACAGGACCATGT 6 


i l t i l i i l l l ■ 


ftTTtt ITS! 13CCCCGGCrGGTTTTGCGATTCTAAAATGTAATAATAAGACGTTCAATGGAACAGGACCATGTft 
6300 63 l O 5320 6330 6340 6350 6360 6370 

■740 750 760 770 780 730 800 

CftAPiTGTCAGCACACn ftCftft TGTftl 1ACATGGAATTASGCCAGTftGTftTCftftCTCftftCTGCTGTTGftftTGGCft 
I I ! I I I ! ! ! ! I ! I ! ! ! ! ! i i ! ! ! ! ! ! ! ! > ! ! > ! ! ‘ i i i i • ! ! ! ! ! ! ! ! i > ! • i • i ■ > > > > i > 1 > * * ■ * ■ • ’ 

CAAftTGTCAGCACASTACAATGTAC^ATGGAATTAGGCCAGTAGTATCAACTCAACTGCTGTTAAATGGCA 
6980 6930 7000 7010 7020 7030 7040 

QIC 320 330 840 850 860 870 

raTCTABCftl-’PftGftAGAljiGTAi.rjTftrrrTAGATCTGCCftftTTTCACPGACAATGCTAAAACCATAATAGTACAGC 

; ; ; ; ; ; ; ; ; ; ! ; ; ; ; ; ; ; ; I ! ! ; ! ; ! i ! ! ‘ ! ! i ! ! • i ! ! i > ! I ! ’ ! > '< • < • i • < • > • > • ■ • • • .. 

GTGTCiBGAGftft'SftftGftCiGT ABTftftTTAG: ATCTGGCA ATTTCACAG ACA ATGCT ftftftftCCftT ftftT ftGT ftCftGC 
7050 7060 7070 7080 7030 7100 7110 

880 830 300 310 920 330 340 

TGAACCAATGT (5 T ASAAATT AATTGT ACAAGACCCAACAACAAT ftCftftGftftftftftGT ftTCCGT ftTCCftGftGGG 

, , , , , , , , t , t , , , , , t t , , < i t » i i t t » i t i i i i t f i t i • > * ■ 1 * 1 * * » * » * 1 * * * * * * * * * ’ * * J * I 
, , , ; , , , , , , , ( I < t t t * I 1 ( t f I t t I t l ) t * t t I t t t I I I.. I t t t I t t I I I I t I I I I I I I I I 

TGAACCAATGTST ASAAATT AATTGT ACAAGACCCAACAACAAT ftCftftGftftftftftGTftTCCGT ftTCCftGftGftG 
7120 ','130 7140 7150 7160 7170 7180 

950 5S0 370 980 990 1000 1010 1020 

GACCAGGGAGAG.CAT 7 l GTTACftATftGGftftAAATAGGAAATATGAGACAAGCACATTGTAACATTAGTAGAG 

, , , . , , t t , , , , , , , , i , i t ..* t t t i I * » ( i t i I ) i i < I * * I * * • I • * » ' » ' ' • * 1 .. 

i , t f i - j t i I t i : J ! : i i t i i t i i i i i ! i l t l i i t i i i l i t t t i t i l « i « < i i l i * i ■ 1 * l • 1 * . .* » 

GACCASGEAGAGUA'n I GTTftCftftTftGGAAAAATAGGAAATATGAGACAAGCACATTGTAACATTAGTAGAG 
7190 7700 721.0 7220 7230 7240 7250 


1030 1040 1050 1060 1070 1080 1090 

CAAftATGCAATFCCACTTTAAAACAGATAGCTAGCAAATTAAGAGAACAATTTGGAAATAATAAAACAATAA 


i t t i t i i t t i t i i i i i i i i » t i i i i i i 

) | , I I 1 ! t I 1 I I I I I I I I < I > 1 * I t I I 


CAAAATGGAAT AACAC.TTT AA AftCAGAT AGAT AGCAAATT AAGAGAAC AATTTGGAAAT AftT AAAACAAT AA 
7260 7270 7280 '7290 7300 7310 7320 7330 


11OO 111O 1120 1130 1140 1150 1160 

TCTTTftAGCAAlCCTCAGGftGGGLvACCCftGftAATTGTAACGCACAGTTTTAATTGTGGAGGGGAATTTTTCT 

,,, tl i r ... . » t l t .... 

| 1 t : I t t I 1 ! 1 ! . I t t I ( I 1 I I I * I * I t t ♦ * I • ' I ' » ' ' 1 » ‘ . .( I I 1 I I I . . ) t t t I I 

TCT1 TAAECAGTCCTCAGGAQGSGACCCftGAAATTGTAACGCACAGTTTTAATTGTGGAGGGGAATTTTTCT 
7340 "’350 7360 7370 7380 7390 7400 

1i70 1130 11SO 1800 1210 1220 1230 

ACTGT AATTCAACACAACTGTTTAAT.AGT ACTTGGTTT AAT AGT ACTTGGAGT ACTGAAGGGTCAAAT AACA 

, , i i r i i i i . i i i i i i f i i t t i t » i i . i i i i i i i i i i i i i t i i * ' » • ' » « 1 ' * 1 ' ' . i » t t • i t t i i i i i J 

i t i t t I i I i i t t t i i , t r ( i ] , t i t t f i ■ t i i r i i i t i i t i t t » i i t i i i i i ■ i i i . i . t i i t . t . . i t . t 

ACTCiTAATTCA AC AGAACT6TTT AAT AGT ACTTGGTTT AftT ftGT ACTTGGAGT ACT AAAGGGTCAAAT AACA 
7410 7420 7430 7440 7450 7460 7470 

1240 1250 1260 1270 1280 1290 1300 

CTGAAGGAAGTGACACAftTCftCPCICCCATGCAGAATAAAftCftATTTATAAACATGTGGCAGGAAGTAGGftft 

i! !■ ‘ ! i ■!'!<!•• i !!'• > .. 

CTGAAGGAAGTGACACAATCACCC i'CCCATGCAGAATftftftACftftftTTATAftftCATGTGGCftGGftftGTAGGftft 
7480 7430 7500 7510 7520 7530 7540 


1310 1320 1330 1340 1350 1360 1370 1380 

AAGCAATGTATC :CGCC rCCCATCAGCEGACAAATTAGATGTTCATCAAAT ATT ACAGGGCT6CT ATT AACftft 




7550 


7S£<> 7570 7580 7590 7600 7610 


1330 1400 1410 1420 1430 1440 1450 

EiAGATGGTCGT'AA'rAACAACP.ATGCGTCCGAGATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGA 


I ! I t t I 


I t I I I 


GASATGGTGeTAATAGCAACAATeaAeTCCGAGATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGA 
7620 7630 76'? O 7650 7660 7670 7680 7660 


1460 1470 1430 1430 1500 1510 1520 

GAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAA 

j | ; I I J I ! ! J ! ! ! J ! 1 I 1 I ! ! ! ! ! ! 1 ! ! I • » i t i i i t i i i i i i i < i i i * » i * i * * * » » ' * ' • * * • • 1 • * 1 1 • 

GAAGTGAATTATfATAAATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAA 
7700 7710 7720 7730 7740 7750 7760 


1530 1540 1550 1560 1570 1580 1590 

GAG7 GGTGCAGAGASAAAAAAGAGCAGTGGGAATAGGAGCTTTGTTCCTTGGGTTCTTGGGAGCAGCAGGAA 


GA 


iTGGTGCAGAGAGAAAAAAGAGCAGTGGGAATAGGAGCTTTGTTCCTTGGGTTCTTGGGAGCAGCAGGAA 

7770 7780 77S0 7300 7810 7820 7830 


1600 1o1O 1620 1630 1640 1650 1660 

GCACTATGGGCGCACGGTCAATGACGCTGACGGTACAGGCCAGACAATTATTGTCTGGTATAGTGCAGCAGC 

, , , ) | i t i < ! I i r i t i i t i i i » t » i i l t l < t l i l i ..I t i i l i i l t t t i l i i 1 l ) l 1 l l 1 l 1 * > 

i , i t ) j i i i i i r i t • : i t t t i i t i t > J t i t i t i i i i i t i i .. . * t t i i i t i i t • i t • i i i i i i i i i 

GCAGTATGtyuCGCAGCGTCAATGACGCTGACGGTACAGGCCAGACAATTATTGTCTGGTATAGTGCAGCAGC 
7840 7350 7880 7870 7880 7890 7900 

1670 1680 1630 1700 1710 1720 1730 1740 

AGAACAAT TTGOTGAGi .'GCTATTGAGGCGCAACAGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGC 

1 ) I t l • t l l t t i * i i t i t ! : i t l t i ■ l i i t 1 t l t l * i l i i t < t I t l I i i < l l i i • I • ■ l I * ( » i i I I • I I t 

AGAACAATTTGCTGAGRGCTATTGAGGCGCAACAGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGC 

7910 7S20 7930 7940 7950 7960 7970 


1750 1760 1770 1780 1790 1800 1810 

TCCAGGCAAGAATCCTGGCTSTGGAAAOATACCTAAAGGATCAACAGCTCCTGGGGATTTGGGGTTGCTCTG 

I ; I t t : 1 i t i i i i t : t t I I i I 1 t : i i t i : l i t i : t i I i I i i I t i ) t i < I i ■ i I i i i i i i i i i i I i i i i ■ i i I 

i i ■ i i t i i : i t ' i i i i t i i i * i i t i i i t t i ) l i t i i i t i i i i i i l t i * i i t * t i i i i ■ i l i i i l i i t i l i > t 

' rCCAG&CftftGAA!CC‘ I 6GCTGTGGAAAGATACCTAAAGGATCAACAGGTCCTGGGGATTTGGGGTTGCTCTG 

7980 7930 8000 8010 8020 8030 8040 8050 


1820 1630 1840 1350 1860 1870 1880 

GAAAAGTCATTT laCACCACTGCT G'l 'GCCTTGGAATGCT AGTTGG AGT A ATAAATCTCTGGAACAGATTTGGA 

I i i I i i i t i i : t i t i i i i i t i i t i i < i t i ■ i t ■ t t i i t i i t i i t i I i < i i ■ i i i i i • I i I i i • • i i i < • I i i 

i t t t t t i i i ■ t i t i ! i t t t i i i i i i i i i i i i i t i t i i i i < i ) i i i • t i t t i i t t i i i » I t i i i i i t i i i 

GAAAACTCATTTGCftCCACTGCTGTGCCTTGGAATGCTAGTTGGAGTAATAAATCTCTGGAACAGATTTGGA 

8060 8070 8030 8090 8100 8110 8120 


1830 1900 ISIO 1920 1930 1940 1950 

AT AACATGACCTGGA7GGAGTGG6ACAEAGAAATTAACAATTACACAAGCTTAATACATTCCTTAATTGAAG 

i I I I i * i t t I • i t i : : ! : t : : : i : i t ( I i i t t i l t i i t l i ... i i i t » l i t t t i t t l i i i t i i i i 

i | ! ► t : I 1 I 1 I ! I 1 1 1 l t •. t t i i i i t t i i t t I I 1 i i t ... i i i i t t i i i i ft t i i i i i i t i i t t t i 


AT'A ACATGACl;' l 'GGATGGAG'TGGGACAGAGAAATT AACAATT ACACAAGCTTAAT ACACTCCTTAATTGAAG 

8130 3140 8150 3160 8170 8180 8190 


1960 1670 1930 1990 2000 2010 2020 

AATCGCAAAAnCAGCAAGAAAAGAATGAACAAGAATTATTGGAATTAGATAAATGGGCAAGTTTGTGGAATT 

i i i t i i i t t i i t i i i i i 1 i i l t ; t t i * i t ( t I T t t t t i i t 1 I I I l 1 t i i . . I I I I < I i t I i i i > I 

AATCRCAAAftCCAGCAAGAAAAGAATGAACAAGAATTATTGGAATTAGATAAATGGGCAAGTTTGTGGAATT 
8200 821O 8220 3230 8240 8250 8260 


2030 2040 2050 2060 2070 2080 2090 2100 

GGTTTAACATAACAAATTGGCTGTGGTATATAAAAATATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAA 


GGTTTAACATAACAAATTGGCTGTGGTATATAAAATTATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAA 


8270 0280 8890 8300 8310 8320 8330 


8110 

GAA7AGTTT 

t t i i i t i i t 
i l l i t : i ) i 

(••jflQ'i nr-;T r f 


9120 2130 2140 2150 2160 2170 

7t:CTGT ACT!TCTA'TAGTGAATAGAGTTAGGCAGGGATATTCACCATTATCGTTTCAGACCC 

, i t r t t i l t i ) • ; t t i t t I I ■ < I t ( ■ I i t i ■ f I I I i ■ i * t t t t I t I ■ i ■ * ■ I t ( t t t « t I i 


TiT.-T— i nCT~i T(':T4 .'ftGT(Ti4Pi'7 4lnAi: ;TTfti~GCAOf9raftTA_TTCAC_CAXIATCGIII£AGACCC 








8340 


33G0 


8380 


8390 


8400 


8410 


2180 3ISO 2200 2210 2220 2230 2240 


i t i * i i i i i 


RCCTCCCRRTCCCGRUi'-iiGGRCCCGRCACiGCCCGRRGGRATRGRRGRRGRRGGTGGAGAGRGAGACRGAGRCA 

8420 £’.430 8440 8450 8460 8470 8480 

2250 ■: 2S0 2270 2280 2290 2300 2310 

GATCCATTCSRTTACnGRRCEiGATUCTTAECRCTTRTCTGGGRCGRTCTGCGGRGCCTTGTGCCTCTTCRGC 
i J 1 J ! I I I ! ! ‘ ! ! ! ! ! I ! i I • ! I ■ ! I ! I I ’ i I * ! i I I ! i ! i i i i « « i ‘ * t * * ' * * • < • 1 1 1 • ‘ 1 1 1 * 1 1 

CiATCCftTTCFRTTAGTFRACGGRTCCTTRGCRCTTRTCTGGGRCGRTCTGCGGRGCC-TGTGCCTCTTCRGC 
8430 8500 351O 8520 8530 8540 8550 

2320 2330 .2340 2350 2360 2370 2380 

TRCCnCC60TTC.'R6RGnGTTRCTCTTGRTT6TRRCGRGGRTTGTGGRRCTTCTGGGRCGCRGGGGGTGGGRR 

I ! I | | I | | I I ! I I J J J J I ! ! ! I ! J I ! J ! I ! ! I ! J ! J ! * ! I I I ! I I I * » • i * » < i > < < » » • • > * • * 1 ■ * * * 1 ■ 

TRCCRCCGCTTGRGReRCTTRCTCTTGRTTGTRRCGAGGRTTGTGGRRCTTCTGGGRCGCRGGGGGTGGGRR 

8560 8570 8530 8590 8600 8610 8620 


2390 24-00 2410 2420 2430 2440 

GCCCTCRRATRTTGGTbiGAATCTCCTRCRGTRTTGGAGTCRGGARCTRRAG 


GCCCTCRRATRT TI33TIIGRRTCTCCTRCRSTRTTGGRGTCRGGRGCTRRRG 
8630 9640 8S50 86SO 8670 X 


2. KUNZ- 158-CL33. SEP 

HTVH3CG Humeri T-cel1 1ymphotropic virus type III« complete 

ID HVVH3CG standard* RiMR? 9749 BP. 

XX 

AC KC2010 - K 02008 * K0200S > 

XX 

DT OS-SEP-1987 < an correction) 

DT 03-SEP-1997 (an correction) 

DT 03-SEP-1987 (an cor rection > 

DT 02—SEP-1937 (an correc t ion) 

DT 01-SEP-1987 (an correction) 

DT 23--JUN—1987 < minor modi f icat ions) 

DT 28--0CT-133G ( incorporated) 

XX 

DE Human T-ceil iymphotropic virus type III* complete reference genome 
DE (isolates HXB.7 >- HXB3v BH10* BH5 and BH8 Of HTLV-III DNR). 

XX 

KW acquired immune deficiency syndrome; complete genome; env gene; 

KW gag gene; icng terminal repeat) poi gene; polyprotein; provirus; 

KW reversa transcriptase? TAR protein; trans-activator. 

XX 

OS Human T-cal1 1ymphotrepic virus type III 
□C Viridae; ss-RNA enveloped viruses; Retroviridae. 

XX 

RN CU (bases 1-853* 9116-9749) 

RR Stareich B. R, » Ratner L. * Josephs S. F. * Okamot-o T. * Gallo R. C. * 

RR Wong-staal F. ) 

RT "Characterization of long terminal repeat sequences of htlv—iii"; 

RL SCienca 227•538-540( 1SS5) . 

XX 

RN [23 (bases 1-9749) 

RR Ratner i_ * He.se 111 ne W. * Patarca R. * Livak K. J. * Stare ich B. R. » 

RR Josephs s, f. . noran E. R. * Rafaiski J. A. * Whitehorn E. R. * 

RR Baumsister K„ . Ivanoff L. * Pefteway S. R. Jr. * Pearson M. L. . 

RR Lauteriberyor j. A. » Papas T. S. * Ghrayeb J. * Chang N. T. , Gallo R. C. * 

RR WoiiQ-Staa). F= L 

RT "Complete nu.r 1 not 1 o'e sequence of the RIDS virus* HTLV—III"; 

RL Nature 3)3 = 277-284< 1385>. _ _ 







XX 

RN C33 (bases 503-SS6G) axons orilyv tat mrna 

RPi Avya s. k. 9 bud c, 9 Josephs s. F. » wong-staal F. ; 

RT "Trans-act.ivator gens of human T-1ymphotropic virus type III 

RT (MTLV-II J. ) " ; 

RL Science 228:53*73(1385). 

XX 

RN C4.1 (bases 5778-5082 9 3337-8459) 

RA sodroski J. s. 9 Paterca R. * Rosen C. ft. « wong-staal F. > Haseltine w. ; 
RT "Location of the trans-actlvatlng region on the genome of human 
RT T-cel 1 lymphotropic virus type III"; 

RL Science 223s74-77< 1335), 

XX 

RN C5.1 mrna splice sites 

Rft Rabson ft, B, s Daugherty D. F. .. Vankatesan S. * Boulukos K. e. * 

Rft Bonn S, I, v Folks T, M„ > Fearino P. » Martin M. 5 

RT "'i ranscri.plion of novel open reading frames of AIDS retrovirus 
RT during infection of lymphocytes"5 
RL Sc-ionce 228’ 1388-1350< 1S85). 

XX 

RN E61 27k antigen cds 

Rft Allan J„ S. , Coligan J. E. , Lee T. H. . McLane M. F. , Kanki P. J. . 

Rft Groopman J, E. . Essex M. 5 

RT "ft new HTL.V-111/LAV encoded antigen detected by antibodies from 
RT AIDS patients"> 

RL Science 230«810-813( 1985),, 

XX 

RN C7J (bases 5778-8933) in hXb-3 

Rft C-rowl FT ■; Ganguly K. => Gordon M. » Conroy R. * Schaber M. > Kramer R. . 

Rft Shaw G s Wong-staal F, ( Reddy E„ P. ; 

RT "KTLV-IT 7 env gene products synthesized in E. coli are recognized 
RT by antibodies present in the sera of AIDS patients"; 

RL Cell 41« 579-387. ( 1585) „ 

XX 

RN E31 gp160 and gpl20 coding sequences 

Rft Allan :h S. - Co), igan J. H. * Bar in F. » McLane M. F. » Sodroski J. G. « 

Rft Rosen C„ A, j Haseltine U. ft, » Lee T. H. , Essex M. ; 

RT "Major glycoprotein antigens that induce antibodies in AIDS 
RT patients are encoded by HTLV-IX.I" 5 
RL Science 228i1031-1034<1935>. 

XX 

RN ESI regulatory sequences in the ltr 

Rft Rosen C. ft, . Sodroski J* G. « Haseltine w. ft. ; 

RT "The location of cis—acting regulatory sequences in the human T 
RT cell 1ymphotropic virus type III (HTLV—111/LAV) long terminal 
RT repeat." s 

RL Ce11 4 X *313-323<i985) „ 

XX 

RN C103 (bases 1-9749) 

Rft Van Beveren C, > Van Beveren c, ? Van Beveren C. ; 

RT "Appendix B* HTLV-3/LAV genome"" 

RL (in) Weiss R. Teich N. * varmus and Coffin J. M. (eds. >; 

RL RiMft Tumor Viruses Second Edit ion'1102-1148 
RL Cold Spring Harbor Laboratory« New York (1985) 

XX 

RN 1 1 1 .1 trans—act i vator function and tar sequence 

Rft Rosen C. ft. > Gobroaki J„ G„ , Goh W. C. . Dayton ft. I. . Lippke J. » 

Rft Haseltine w„ ft, ; 

RT "Post-transcriptional regulation accounts for the trans-activation 
RT of the human T-lymphotropic virus type III"; 

RL Nature 3I5 s555-559( 1936). 

XX 

RN E 123 poI rodirig sequence 

Rft Mc-rzo Veronese* F. . Copeland T„ D, , DeVico ft. L. , Rahman R. . 

Rft Ovnszlan 5. » Gallo R. c. ■> sarngadharan M. G. ; 

RT "Character J cat i on of hiahi. v j mmunc)c 3 erLLc_o 66 /B 5 _l_as__the__reverse_ 








RT t ranscr i pte.se of HTLV- 7.1 1 /LAV" 5 

RL St • 1 erire 231 : 1 2F9- 1 39 X < 1986) . 

XX 

RN Cl33 the 23k sor gene product 

RA Kan N. C. i French im G„ s Wong-staal F. « DuBols G. C. , Robey W. G. . 

RA Lautenberger J-, A. s Papas 2 S. ! 

RT "identification of HTLV-III/LAV sor gene product and detection of 
RT antibodies in human sera";: 

RL SCIence 231;1353-1555< 1SS6) . 

XX 

RN [14 3 poi nh2-terrolnal region 

RA Kramer R.a , schaber M. o. * Skaika A. M. i Ganguly K. . wong-staal F. . 
RA Reddy F, F. ", 

RT 11 htlv/— ill gag protein is processed in yeast cells by the virus 

RT pol-protease"i 

RL sc <ence 231 I 1580-1534 < 19S6). 

XX 

RN C153 sor 23k protein 

RA Leo T» Hi: , Col xgan J„ E„ , Allan J. S. . McLane M. F. « Groopman J. E. , 

RA Esse/ M. ; 

RT "A new HTL.V—III/LAV protein encoded by a gene found in cytopathlc 
RT retrow1 ruses"; 

RL Science 231 : 1546-1549<1986). 

XX 

RN C163 sor 2.3k protein 

RA sodroskl J. O. 5 Goh w. C. * Rosen C. A. . Tartar A. . Portetelle D. . 

RA Burny A„ » Hase i t i na w. 5 

RT "Replicative and cytopathlc potential of HTLV-III/LAV with sor 

RT gene deletions"5 

RL Sc; 1. ence 231 3 i 549 -1553 ( 1SS6). 

XX 

RN C173 spi binding sites in the promoter region 
RA Jones K. A„ •; Kadonaga J. T. » Luciw P. A. » Tj ian R. ! 

RT "Activation of the AIDS retrovirus promoter by the cellular 

RT transcription factor? Spl"; 

RL Science 232*755-759<1986), 

XX 

RN C1S3 acceptor and donor splice sites for tat and 27k 
RA Arya S„ K. * Gallo R. C. i 

RT "Three novel genes of human T—1ymphotropic virus type III* Immune 
RT reactivity of their products with sera from acquired immune 
RT deficiency syndrome pa tients"? 

RL Proc. Nat 1. Acad. Sci. U. S. A. 83*2209-2213C 1986). 

XX 

RN [193 deletion mutants in the tat gene 

RA Dayton A, l, » Sodrc.ski j. G= » Rosen C. A. . Goh W. C. . Haseltine W. A. ; 
RT "The trans-activator gene of the human T cell 1ymphotropic virus 
RT type III _s required for replication"; 

RL Ce1 l 44*941 -347< 1986 >. 

XX 
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RA Willey R. W. , Ruth ledge R. A. * Dias S. * Folks T. * Theodore T. S. . 

RA Buckler C. f£. •» Martin M„ A. 5 

RT "Identification of conserved and divergent domains within the 
RT envelope gene of the acquired immunodeficiency syndrom 
RT retrovirus"; 

RL Proc. Natl. Acad. Sci. U. S. A. S3 s 5033—5042< 1386). 

XX 

RN [213 art cds boundaries 
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CC R. Crowl. os.' 17/85. R. Patarca provided sites information and a 
CC clean copy for [43 9 08/16/35. Required immune deficiency syndrome 

CC (RIDS? is caused toy a retrovirus Known by several names> perhaps 

CC representing two separate strains 5 human T-cell lymphotroplc 
CC virus-III (HTL.V-III) . whose sequence is given below, and 
CC lymphaderiupathy-assoclated virus <LAV) are thought to be one strain 
CC differing from ftiDS-associated retrovirus type 2 (ARV-2) when 
CC overall homology is the criterion. Some reading frame similarities 
CC suggest that ARV-2 and LAV are more closely related. All three 
CC viruses? whose sequences do not differ by more than 6%> are 
CC believed to belong to the C type subfamily Lentiviridae. the "slow" 
CC retroviruses. The BHiC sequence differs from BH8 and BH5 by 0.3% in 
CC the codings reckons and l. 3% in the noncoding regions, and the 

CC authors of [21 believe that these are stable variants. The 5’ and 

CC 3’ LTRs of BHiO arid BM8 were not fully sequenced; the missing bases 
CC <433-675 and S608-974S) were filled in by [23 from the proviral 

CC clone HXB2 [13. The sequence below is that of BH10 with exception 

CC of the variation at position 3187 which allows annotation of the 
CC 27K coding sequence. The BH8 sequence spans bases 6033 to 3607. the 

CC BH5 sequence spans bases 675 to 6033. and the HXB3 sequence [73 

CC spans bases 5778 to 8933. While this entry is offerred as the 

CC reference locus for the AIDS retroviral sequence loci, no claim is 

CC being made that this sequence is more prevalent or typical than 

CC others, ail of which have been entered in this library with 
CC annotation. The HTLV-XII genome encodes at least six proteins or 
CC polyproteins! gag, pal, env. TAT. 27K antigen and the sor 23K 
CC product. The 3’ ORF (positions 8797-9447) is truncated in BH10 
CC (stop codon at positions 9136-3198). but reads through in BH8 and 

CC other sequences to yield what is now called the 27K antigen. The 

CC sequence below is from BHIO with exception of the variation at 
CC position 3197 which allows annotation of the 27K coding sequence. 

CC Additionally there are four short open reading frames, bases 
CC 1248-1406. 4442-4642. 3532-5828 and 6095-6340. which are conserved 
CC to a large degree. A seventh gene has been proposed based upon a 
CC combInation of mutational and regulatory evidence: called "ART" ( 

CC for ant 1 -repression transactivator). its product appears to act 

CC post-transcript ioris! ly to relieve negative repression of gag and 
CC env production [213. The exon assignments for ART are putative, but 
CC if they arcs corroborated» the ART protein would be 116 amino acids 
CC in length The mechanism for pal gene translation has not been 

CC elucidated'' a gag-pol fusion protein is possible, splicing or 

CC frameshi ft have not been ruled out. The viral protease would be 

CC determined by the region in question. Approximately two-thirds of 
CC the variant sites in the gag and pol genes are "silent mutations". 
CC while over half of those in the env gene are not. Reference [20] 

CC defines divergent and conserved regions for the env gene. Because 
CC of the excessive variability of the env gene, differences between 

CC the sequences summarized herein and other env gene entries have not 

CC been annotated; only HTLV-III sequence variations have been 
CC included in the sites of this entry. Other entries will include 

CC information for alignment with this entry, including the Zaire and 

CC New York isolate sequences reported by [203. The TAT protein 
CC (t.rsns--act.ivatnr protein, approximately 14 kd> is an effector of an 

CC autostimulatory pathway through interaction with a positive control 
CC element, the trans—activating responsive sequence. TAR. TAT seems 
CC to be n transcr 1 pt i one.l control molecule in HTLV-I. but [113 
CC demonstrates that it is a post-transcriptional regulatory molecule 
CC in HTLV-III. Deletion mutants in the TAT gene are incapable of 
CC prolific replication and exhibit no cytopathic effects in T4+ cell 
CC lines [133, The TAR sequence!s) are found to be between -17 and +80 
CC relative t.o the cap site +1 (base 455) and is highly conserved. 

CC Enhancer sequences which need not be viral-spedfic are found 

CC upstream from TAR [33. 1 : 11 3. Three tandem decanucleotide Spl binding 
CC sites are located between bases 377 and 403? of which site III 

CC shows the strongest affinity for the cellular factor; intact, the 

CC t.hr^a Ftt.yi muse wr. to a tenfold effect_mT_3LraiisgrJj5tJ^Qai_ 






efficiency in vitro < L173 (The authors demonstrate the existence of 


cc 

Sfi l in a h 

umsn T—cs 1 1 1 1 ne ) . 

in addition to the ~3. 4 Kb genomic 

cc 

mRNA 9 subg 

onovnic mRNAs of 7, 

4 ? 5.5( 5= 0 

, 4 . 3 , 2 . O and l. 8 have been 

cc 

detected. 

All are i 

probably polyadenylated at the same site* 

cc 

position 9666 be low 9 with a 

potential polyadenyation signal at 

cc 

3642-3648 * 

and cap 

ped at the same site* 

position 455, with a 

cc 

potentia 1 

TATA box 

at 427-431= The dOUb 

ly-spllced transcript of 

cc 

about 2 . O 

!<b is re 

sponsi & 1 a 

for the TAT 

message at least, and 

cc 

depending 

upon the 

acceptor 

site* also 

for the sor and 27K 

cc 

messages* 

g iven t h 

at a single* albeit partial* mRNA exists for all 

cc 

three CHS3 

= Tho acceptor sp 

l ice for TAT 

is at position 5811 and the 

cc 

putative e 

cceptor 

splice for 27K is at 

position 6010! the donor 

cc 

splice site in all 

three ca 

ses would be 

at position 6079 [183. The 

cc 

doubly spl 

icec! message would also encode The newly proposed ART 

cc 

protein u 
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FH 






FT 

RPT 

1 

834 

5 9 LTR 


FT 

RPT 

? 

634 

5’ LTR 


FT 

VARIANT 

CJO 

82 

a in BHIO! 

g in H3 

FT 

VARIANT 

lOl 

lOl 

q in 0 H 10 5 

a m H3 

FT 

VARIANT 

103 

108 

a in [2 ]v 

H95 g in HXB2 [13 

FT 

VAF? I ANT 


184 

g in C 2 ]; 

t in HXB2 [13, H3 

FT 

VARIANT 

188 

188 

t in 121 v 

g in HXB2 [13, H9 

FT 

VARIANT 

ITS 

178 

a in [ 2 ] ; 

g in HXB2 [13, H9 

FT 

VARIANT 

103 

183 

c in [ 21 ? 

H9! t in HXB2 [13 

FT 

VARIANT 

2^7 

227 

a in [23* 

H9* g in HXB2 [13 

FT 

VARIANT 

231 

231 

& in C^l* 

g in HXB2 [13, H3 

FT 

VARIANT 

333 

333 

c in C23* 

t 1n HXB2 C13, H3 

FT 

SITE 

377 

388 

Spl binding site III [171 

FT 

SITE 

388 

337 

Spl binding site II C17] 

FT 

SITE 

3S9 

408 

Spl binding site I [173 

FT 

VARIANT 

4? 1 

421 

c in BHIO. 

BH5; t in H9 

FT 

RPT 

454 

551 

R repeat 5 

i ’ copy 

FT 

rp r 

454 

551 

R repeat 5 

i ’ copy 

FT 

PRGVRL 

454 

3888 

HTLV3 virion RNA 

FT 

CAP 


455 

genomic mRNA start (cap site) [103 

FT 

CAP 

455 

455 

TAT, ART mRNA exon 1 start (cap site) 

FT 




[103, C1S3 

, C 21 3 

FT 

VARIANT 

501 

501 

a in BHIO, 

BH5 , H9 ; g in HXB2 [ 13 

FT 

SITE 

836 

653 

primer (Lys-tRNA) binding site 

FT 

VARIANT 

S54 

S54 

c in BHIO. 

BH5 ; t in H9 

FT 

VARIANT 

877 

677 

g in BHIO* 

BH5 ; ggag in H9 

FT 

VARIANT 

704 

704 

tga in BH10 * H9 * gin BH5 [23 

FT 

CDS 

707 

2325 

gag polyprotein precursor 

FT 

VARIANT 

12SO 

1230 

a in BHIO? 

g in BH5 [23 , H9 

FT 

VARIANT 

1431 

1431 

a in BHIO ; 

g in BH5 [23 , H3 

FT 

VARIANT 

1455 

1455 

t in BHIO* 

H9 ? C in BH5 [23 

FT 

VARIANT 

1 G1 1 

181 1 

a in BH10 9 

H9? g in BH5 [23 

FT 

VARIANT 

1870 

1620 

c in BHIO* 

H35 t in BH5 [23 

FT 

VARIANT 

1858 

1858 

a in BHIO * 

. H9 , g in BH5 [23 

FT 

VARIANT 

1 Go 2 

1882 

t in bhio; 

: C in BH5 [23, H9 

FT 

VARIANT 

187 o 

1875 

g in BHIO s 

, BH5 * c in H9 

FT 

VARIANT 

1772 

1722 

g in BHIO i 

, H35 a in BH5 [23 

FT 

VARIANT 

180 G 

1306 

g i ri BH 10 1 

, BH5S a in H9 

FT 

VARIANT 

1S45 

1345 

a in BH1Oi 

, BH5 * g in H3 

FT 

VARIANT 

1303 

1303 

a in BHIO' 

, H95 t in BH5 [23 

FT 

VARIANT 

1308 

X SOS 

g in BH1 O i 

, H3 * a in BH5 [23 

FT 

VARIANT 

1373 

1 323 

g in BH10- 

, H3* a in BH5 [23 

FT 

VARIANT 

1350 

1950 

g in BH1 O - 

, H9S a in BH5 [23 

FT 

VARIANT 

1353 

1353 

g in BHIO' 

, H3 5 t in BH5 [23 

FT 

VARIAN F 

1338 

1988 

c in BHIO 

, H9* t. in BH5 [23 

FT 

VARIANT 

1382 

1392 

c in BHIO 

» H3; a in bhs [23 

FT 

VARIANT 

2003 

2003 

g in BHIO 

* HS5 a in BH5 [23 

FT 

VARIANT 

20 1 3 

1 3 

n in BHIO 

, H3, a in BH5 [23 



FT 

CDS 

2351. 

5122 

poi polyprotein (NH2-terim 1 nus 


FT 




uncertain; AA at 2331) 


FT 

VARIANT 

2468 

2468 

g in BHIO, BH5» a in H3 


FT 

VARIANT 

25 a l 

2591 

C in BH10 v H9? t in BH5 [2 3 


FT 

VARIANT 

2600 

2600 

g in BHlO i H35 a in BH5 [23 


FT 

VARIANT 

2741 

2741 

g m BH10; a in BH5 [ 23, H3 


FT 

VARIANT 

2827 

2827 

a in BHlOf H9? g in BH5 [23 


FT 

VARIANT 

2838 

2358 

a in BHIOs H35 g in BH5 [23 


FT 

VARIANT 

2980 

2520 

c iri BHIO, H3f t in BH5 [23 


FT 

VARIANT 

3007 

3007 

tta in BHIO? H9; gtg in BH5 [23 


FT 

VARIANT 

3037 

3057 

a in BHio; g in BH5 [23, H3 


FT 

VARIANT 

3122 

3122 

C in BHIO, H9 ? t in BH5 [23 


FT 

VARIANT 

3222 

3222 

C in BHIO, H9 5 t in BH5 [23 


FT 

VARIANT 

3302 

3302 

ag in BHIO, H3? ga in BH5 [23 


FT 

VARIANT 

3363 

3 3 SG 

g in BHIO, H3; a in BH5 [23 


FT 

VARIANT 

7 7.00 
Os-> 

3385 

g in BHIO, BH55 a in H9 


FT 

VARIANT 

3395 

3395* 

c in BHIO, H9f t in BH5 [23 


FT 

VARIANT 

3755 

3755 

a in BHIO, BH5 ; g in H3 


FT 

VARIANT 

3797 

3767 

g in BHIO, H95 a in BH5 [23 


FT 

VARIANT 

3833 

3833 

t in BHIO, BH5? c in H9 


FT 

VARIANT 

3855 

-,rr |-*,cr 
■ 

t in 3H10, BH5; c in H3 


FT 

VARIANT 

3899 

3039 

c in BHIO, BH5» t in H3 


FT 

VARIANT 

3922 

3322 

a in BHIO, H3J g in BH5 C23 


FT 

VARIANT 

3934 

3S34 

a in BHIO, BH5; g in HB 


FT 

VARIANT 

Ow *1—'*'V 

3354 

g in BHIO, BH5; c in H 


FT 

VARIANT 

3968 

3962 

caa in BHIO, HS; tag in BH5 [23 


FT 

VARIANT 

3977 

3377 

g in BHIO, H9; a in BH5 [23 


FT 

VARIANT 

3934 

3984 

c in BHIO, H95 a in BH5 [23 


FT 

VARIANT 

3993 

3933 

a in BHIO, H9? c in BH5 [23 


FT 

VARIANT 

4010 

4010 

a in BHIO? g in BH5 [23, H9 


FT 

VARIANT 

401 G 

40 i G 

g in BHIO, H9, a in BH5 [23 


FT 

VARIANT 

4029 

4029 

t in BHIO, H9J c in BH5 [23 


FT 

VARIANT 

4049 

4043 

a in BHIO? g in BH5 [23, H9 


FT 

variant 

4034 

4064 

c in BHIO, H9f t in BH5 [23 


FT 

VARIANT 

4116 

41 16 

a in BHIO, BH5? c in H9 


FT 

VARIANT 

4 i G7 

4167 

g in BHIO, BH5; c in H9 


FT 

VARIANT 

4292 

42S2 

t in BHIO, H9? a in BH5 [23 


FT 

CDS 

5074 

5652 

sor 23K protein 


FT 

VARIANT 

5 i 56 

515S 

a in BHIO, H9; g in BH5 [23 


FT 

VARIANT 

5314 

5314 

t in BHIO, BH5. c in H9 


FT 

VARIANT 

5348 

5348 

a in BHIO, H9? g in BH5 [23 


FT 

VARIANT 

5401 

5401 

t in BHIO, H9, c in BH5 [23 


FT 

VARIANT 

54 1 2 

5412 

c in BHIO, H9; t in BH5 [23 


FT 

VARIANT 

bS'-lG 

5540 

a in BHIO, H9; g in BH5 [23 


FT 

VARIANT 

5628 

5628 

g in BHIO, H9; a in BH5 [23 


FT 

VARIANT 

5846 

5846 

g in BHIO, H9, HXB3; a in BH5 [23 


FT 

CDS 

5894 

ro?o 

TAT protein, exon 2 (first expressed 

FT 




exon) 


FT 

VARIANT 

5934 

SS34 

a in BH10, H9, HXB3 Join BH5 [23 


FT 

CDS 

6003 

6078 

ART protein, exon 2 (first expressed 

FT 




exon; putative > 


FT 

VARIANT 

6035 

6045 

cctcctcaagg in BHIO, HXB3 C73 ; 


FT 




gctcatcgaag in BH8 [23 ; 


FT 




g in BH5 [23, clone 12 CDNA [213 


FT 

VAT1I ANT 

6086 

6086 

g in BHIO, BH8, H9 » a in HXB3 [73 


FT 

VARIANT 

6095 

6056 

t 1„n BHIO, HXB3 [73, H9; C in BH8 

C2] 

FT 

VARIANT 

Si 03 

6108 

a iri BHIO, HXB3 [73, H9f c in BH8 

C 2 ] 

FT 

VARIANT 

SI 13 

61 14 

gc in BHIO, HXB3 [73, H9J 


FT 




gtaac in BH8 [23 


FT 

VARIANT 

6 1 34 

6124 

a in BHIO, HXB3 [73, H9f C in BHS 

C2 ] 

FT 

VARIANT 

6152 

6152 

g in BHIO, HXB3 [73, BHS? C in H9 


FT 

CDS 

6255 

8825 

envelope protein precursor (env) 


FT 

VARIANT 

*: r *—7 * .t 
Qs3> i 

6373 

a in BHIO, HXB3 [73, H9; t in BHS 

[2] 

FT 

VARIANT 

S474 

G474 

t in BHIO, BHS [23, H9; g in HXB3 

C7 ] 

FT 

VARIANT 

6748 

6748 

t in BHIO, HXB3 [73, H9J a in BHS 

[21 

FT 

VARIANT 

6929 

6976 

t in BHIO, HXB3 [73, H95 C in BH8 

[23 




FT 

VARIANT 

7058 

708c; 

a in BH10 5 

H9 5 g in BHS [23) HXB3 [7 3 

FT 

VARIANT 

71 IS 

71 13 

a in bhio: 

HXB3 [73, H9; g in BH 8 [23 

FT 

VARIANT 

71R1 

7123 

cca in BH10, H95 cac in BH 8 [23, 

FT 




HXJB3 [73 


FT 

VARIANT 

7 1V 1 

7172 

gt in BHIO 

, H9 5 aa in BHS [23, HXB3[73 

FT 

VARIANT 

7187 

7187 

a 1 n BH 1 O ? 

H9; g m BH 8 [23) HXB3 [73 

FT 

VARIANT 

7272 

7273 

aa in BH10 

, H35 gc in BH8C23, HXB3 [73 

FT 

VARIANT 

7231 

7281 

a in BHiOi 

BHS [23? H9f C In HXB3 [73 

FT 

VARIANT 

7345 

7343 

g in BHiOi 

BH 8 [23; a in HXB3 [73) H9 

FT 

VARIANT 

7433 

7434 

gtt.taat.agtact.tgg in BHIO* HXB3 [73? 

FT 




and TI3 


FT 

VARIANT 

746 l 

740 ? 

a in BHIO? 

BHS [23; g in HXB3 [73? H9 

FT 

VARIANT 

7433 

7459 

c i n BH10 ? 

BHS [23; a 1n HXB3 [73, H9 

FT 

VARIANT 

7521 

7521 

a m bhio? 

BHS [23, t 1n HXB3 [73, H9 

FT 

VARIANT 

7374 

7574 

t in bhio? 

CHS [23; C m HXB3 [73, H9 

FT 

VARIANT 

7333 

7639 

g 1 n BHIO? 

BHS [2 3 5 a 1n HXB3 [73, H9 

FT 

VARIANT 

YS35 

7 S3 7 

eg in BHIO 

, HXB3 C73, H9J gC In BH8[23 

FT 

VARIANT 

7345 

7645 

a in BH1o? 

BHS [23, H95 g in HXB3 [73 

FT 

VARIANT- 

3030 

SCSI 

ca m bhio 

, BH 8 [23, H9; ac in H 

FT 

VARIANT 

81 27 

8127 

a in bhio? 

BHS [23, H9; C 1n HXB[7 3 

FT 

VARIANT 

0131 

8131 

t in BHIO, 

BHS [23, H9; C in HXB3 [73 

FT 

VARIANT 

;r> 1 ~ > l 

ul 

8135 

c in BHIO? 

BHS [23, H9; g in HXB3 [73 

FT 

VARIANT 

i 

8257 

q in BH10 ? 

BHS, HXB3; a m H9 

FT 

VARIANT 

8273 

0273 

t in BHIO? 

BH 8 , HXB3; g in H9 

FT 

VARIANT 

8334 

8364 

g in BH10? 

HXB3 [73; a 1n BH 8 [23, H9 

FT 

CDS 

8403 

8454 

TAT protein, exon 3 ( AA at 8410) 

FT 

CDS 

8408 

8683 

ART protein? exon 3 (putative; AA at 

FT 




8411) 


FT 

VARIANT 

3422 

8422 

t in BHIO? 

HXB3 [73, Clone 12 CDNA 

FT 




C2135 a in 

BHS [23; c in H9 

FT 

VARIANT 

84F4 

8464 

S in BHIO, 

BH 8 , HXB3, clone 12 cDNA 

FT 




[2135 a in 

H9 

FT 

VARIANT 

6657 

8657 

g in BHIO? 

BHS [23; a in HXB3 [73, H9, 

FT 




c 1 one 12 cDNA C 213 

FT 

VARIANT 

8872 

8672 

g in BHIO) 

HXB3 [73, clone 12 cDNA 

FT 




[213, H9 5 

a in BHS [23 

FT 

VARIANT 

3882 

863,3 

g in BHIO) 

HXS3 [73, clone 12 cDNA 

FT 




[213) H9; 

a in BHS [23 

FT 

VARIANT 

3748 

3748 

g in BHIO? 

HXB3 [73, Clone 12 cDNA 

FT 




C2U ) H9) 

t in BH 8 [23 

FT 

VARIANT 

8753 

8753 

g in BHIO? 

H9S c in BH 8 [23 ; 

FT 




a in HXB3 

[73, clone 12 cDNA [213 

FT 

VARIANT 

8771 

8771 

t in BHIO. 

HXB3 [73, Clone 12 CDNA 

FT 




[ 213 ) H9 5 

c in BHS [23 

FT 

CDS 

8827 

9447 

27K protein? exon 3 (-first expressed 

FT 




exon) 


FT 

VARIANT 

8857 

8857 

g in BH10 ? 

BHS, HXB3 , Clone 12 CDNA 

FT 




[213? a in 

H9 

FT 

VARIANT 

,—*r*v ,« 

oo 24* 

6324 

c in BHIO? 

HXB3 [73, Clone 12 cDNA 

FT 




[213? H9; 

t in BHS [23 

FT 

VARIANT 

udb t 

8SS7 

c in BHIO) 

clone 12 CDNA [213, H9; 

FT 




t in BHS C23 

FT 

VARIANT- 

8378 

■ 8978 

a in BH10? 

clone 12 cDNA [213, H9; 

FT 




c in BH 8 [21 

FT 

VARIANT 

8335 

SS85 

t in BHIO? 

clone 12 CDNA [213, H9; 

FT 




c in BHS C23 

FT 

VARIANT 

SS87 

0367 

a in BHIO? 

BHS; c in H9, clone 12 

FT 




CDNA C 213 


FT 

VARIANT 

8384 

3S34 

c in BHIO? 

clone 12 CDNA C213, H9; 

FT 




t in BHS C 

23 

FT 

VARIANT 

3013 

90 19 

g in BHIO ? 

BHS ; a in H9, clone 12 

FT 




CDNA [213 


FT 

RRT 

91 IS 

9748 

3’ LTR 


FT 

VARIANT 

3193 

3136 

t in BHIO? 

clone 12 cDNA [213; 

FT 




c in BHS C23 

FT 

VARIANT 

31 87 

91 97 

cn i n BHS [23? H9 ? cl one 12 cDNA C 21 3 ; 



FT 

FT 

VARIANT 

321 5 

321 B 

a in bhio C23 

g in BHIO 5 BHB; a in H9» clone 12 

FT 

FT 

VARIANT 

32:; ^2 

3723 

cDNA C213 

ga 1n BHIO? clone 12 CDNA [21]? H3? 

FT 

FT 

VARIANT 

9^*79 

3279 

ag 1n BHS[2] 

g in BH10? BHS? c1one 12 cDNA C21]? 

FT 

FT 

VARIANT 

3233 

9203 

t. i n H9 

t in BHlO? BHS? c1one 12 cDNA [21]? 

FT 

FT 

VARIANT 

3234 

3284 

g in H9 

t 1 n BH 10 ? H9 ? cl one 12 cDNA [213? 

FT 

FT 

VARIANT 

3231 

3201 

e in BHS C23 

a in BHIO? BHS? c1one 12 cDNA [21]? 

FT 

FT 

VAFTTANT 

3237 

3297 

g in H9 

c 1n BH10? clone 12 cDNA C21]? H9? 

FT 

FT 

VARIANT 

3354 

□354 

t in BHS [2] 

g in BHIO? HIVDSM), H9; t in BH8 C23 

FT 

VARIANT 

34 OG 

940S 

a in BHIO? BH8; g in H9? clone 12 cDNA 

FT 

FT 

VARIANT 

3440 

0448 

[21 ] 

c in BHIO; t in BH8 C2]? H9? clone 12 

FT 

FT 

VARIANT 

953G 

3583 

cDNA 

c in BHIO? BHS? c1one 12 cDNA [21]; 

FT 

FT 

RFT 

3570 

3660 

g in H9 

R repeat 3’ copy 

FT 

VARIANT 

36 1 G 

3816 

g in HXB2? a in H9? clone 12 cDNA C213 

FT 

VARIANT 

3321 

3821 

g in HXB2; a in H9? clone 12 cDNA [21] 

FT 

VARIANT 

38G3 

38 Go 

t in BHIO? HS; tg in clone 12 cDNA 

FT 

FT 

POLYA 

3666 

36GG 

[21 ] 

TAT? ART? 27K mRNA exon 3 end 

FT 

FT 

POLYA 

3 GOG 

36GG 

<poly-A site) [103,[18],C21] 
genomic mRNA end (poly—A site) [10] 

XX 

SG 

Sequence 

3743 BP 

5 3431 AS 

1781 C; 2369 G5 2168 T? 0 other? 


Initial Score 
Residue Identity 
Gaps 


1872 Optimized Scare - 2176 Significance = 0.00 

SSJo' Matches => 2243 Mismatches - 150 

10G Conservative Substitutions = 0 


X 10 20 30 40 50 GO 

ATGA^GTGA-AGef-^iAA-ATATCAGCACTTGTGGAGA-TGG.GGGTGGAAATGGGGCAC-CATGCTCCTTGG 

; I I l I r 1 I I ! 1 t ! ) I I I t I I I I I I l l I lilt II 111 

till I I t t 1 t t I I I I 1 I I I 1 I l I I I I I I I I I II III 


CTAATAGAAA6AGCAGAAGACAGTGGCAAT— 
6230 6240 6250 


-GAGAGTGAAGGAGAAATATCAGCACTTGTGGAGATGGG 
6260 6270 6280 6290 


70 80 90 100 110 120 

GATAITGATG -AT— CT— G'fAGl GCTACAGAAAAATTGT-GGGTCACAG—TCTATTATGGGGTAC— 

:p rtii i i t i lit) it i i i t t i iiii i i i l l i i if 

ii i i t i iiii :iii ii i i i i * i i i i i i i i i i i t ii 

GGTGGAGATGGGGCAOCATGCTCCTTGGGATGTTGATGATCTGTAGTGCTACAGAAAAATTGTGGGTCACAG 
6300 6310 6320 6330 6340 6350 6360 


1.30 140 150 160 170 180 

-CT-GTGTGGAA-C-sGAAGCAA—CCACCA—CTCT ATTTTGTGCATCAGATGCT AAAGCAT ATGAT 

it i i i i i t i t i i i i i i i i i i i i i i i i > ill i i i i i i i 

t i i i t i i i t t t t i i i t i i t i i i i i i i ill i i i i i i i 

TCTATTATGGGG V ACCTGTGTGGP.AGGAAGCAACCACCACTCTATTTTG-TGC-ATCAGATGCT 

6370 6330 6390 6400 6410 6420 


1:30 200 210 220 230 240 

A-CAGAGG-TACATA-AT—GTTTGGGCCACACATGCCTG—TGTACCCACAGA-CCCCAACCCAC 

i 1 ! I I 1 t l r l It l I I l l l l I l III III llll I I I 1 I 1 I 

t I I I t t I t I It l I I I I I I I I III III llll I I I I I I I 

AAAGiCATA fGATACAGAGGTACATAATGTTTGGGC—CACA—CATGCCTGTGTACCCACAGACCCCAACC-C 
6430 6440 6450 6460 6470 6480 6490 


250 260 770 280 290 300 310 

AAG" AG' f AGT AVTGGT A AA'TGTG ACAGAAAATTTT AACATGTGGAAAA—ATGACATGGT AGAACAGATG—C 


lilt 


till it lit 


A-CAAlvAARTA-GTA-TTG- 

£FO <) _ 


-GTAAAT-GiTGACA-GAAAATTTTAACAT-GTGGAA-AAATGAC 

-8530 8540 6550__ 


K. ;5 1 o 






320 330 340 350 360 370 380 

ATGOGGATATfiPTCAEf-TTTATe-EGATCAAAGCCTAAAGCCATGTG-TAAAA—TTAACCCCACTCTGTGT 


i i i I 


AT-GOT PGAA-GAGATGCATGAGGATAT AA-T C AGTTT ATGGG AT C A AAGCCT AA AGCC A- 


i i i i l 

-TGTGT 


390 400 41.0 420 430 440 

TAGTTTAP-AGTGCACTGATTTGG-GGAATGCT ACT AAT-ACCA AT ACT AGT A AT ACCA AT AGT AGT A 

i tii» i i * t it i i til tit it i i * ' * * » i i i i t i i i t i i i t i * i 

t t : i i i t t t tt t i ill lit it ■ ill it t i i i i i i t i i i i > i i i i 

AAAATTAACCCCACTCTGTGTTAGTTTAAAGT6C—ACTGATTTGAAGAATGATAGTAATACCAATAGTAGTA 
£-.620 6630 6640 6650 6660 6670 6680 

450 460 470 480 430 500 510 

GGGGGSAAPTGATGPTGGAGAPAGGAGAGATAAAAAACTGCTCTTTCAATATCAGCACAAGNATAAGAGGTA 


GCGGGASAATGATAA ! EGAGPPPGEPGAGATAAAAAACTGCTCTTTCAATATCAGCACAAGCAT AAGAGGT A 


GS‘-IO 6700 6710 6720 6730 6740 6750 

520 530 540 550 560 570 580 

AGGTGCAEAAAC !AATATGCATTTTTTTATAAACTTGATATAATACCAATAGATAATGATACTACCAGCTATA 


, t , . 1 f | | t • 1 * t I t • ! t t I t ! t ' t I » T * t I I * > t t t I 1 I I t t I t 1 I I 1 1 t t ! I I I t t I t I » I I I ' 1 1 • ! ’ • 

AEG rGCAEAAAGAATP T EG ATTT TTTT AT AA ACTTG AT AT AAT ACC A AT AGAT AATG AT ACT ACC AGCT AT A 
6760 6770 6780 6730 6S00 6810 6820 

530 GOO 810 620 630 640 650 660 


CGTTGACAAGTTGTAnCACCTCARTCATTACACAGGCCTGTCCAAAGGTATCCTTTGAGCCAATTCCCATAC 


CGTTQACOPCTTOTAOCPCCTCAGTCATTACACAGECCTGTCCAAAGGTATCCTTTGAGCCAATTCCCATAC 
6830 3340 3830 6860 6870 6880 6830 6300 


£70 630 £30 700 710 720 730 

ATT A. rTPTGCGCCGGC1 rGGTTTTECGftTTCTftAAATGT AAT AAT AAGACGTTCAATGGAACAGGACCATGT A 

» i r : i • t ; : i t i i : i i » : ; i t t t ; i i t i » i i i i i i l i t i i i i i i i i i i i t t i t » t i i i i t i i i i » i t i t t t t 

i i t i : i i i t : t t i i i t i 1 i t t t i i i i t i i i t t i i i i i i i i i i i i > i i i > i i t t t i t i i i i i i i i i i i i i i i i 

ATTATTGTGCCCCGGCTGPTTTTGCGATTCTAAAATGT AATAATAAGACGTTCAATGGAACAGGACCATGTA 
£9iO 6920 GS30 6940 6950 6960 6970 


740 750 7S0 770 780 790 800 

CAAATGTCPGCPCACnACAATGTACACATGGAATTAGGCCAGTAGTATCAACTCAACTGCTGTTGAATGGCA 

i i i i i i t t * : ; t i i ■ ; r i . i t i t i i t i i i t i i i i i i i i i i i i t i t » i i i ... i i i i i i i i i i i i i i 

t ; t t l t t t l t ' * l ! t ! t l t r t I t t i t i t f ( r ! I l i i t r i i i 1 l I t i » i t . . . t t i i i i I I t i i i i t t 

CAAAT6TCAGCACAGTACAATETAC;ACATGSAATTAGGCCA6TAGTATCAACTCAACTGCTGTTAAATGGCA 
6330 6990 7000 7010 7020 7030 7040 

CIO 820 8210 840 850 860 870 

GTC1Y IGCYlGPAGAACiPGGT PGTPPTT AEATCTGCCAATTTCACAGACAATGCT AAAACCAT AAT AGT ACAGC 

111: I t I t t t I : t I 1 I I t I J I 1 I I 1 1 I 1 t I * I I I I I I t I 1 1 I I I I I I 1 I I I 1 1 I I 1 I 1 1 I I » » I I I I I 1 I I 

till I I i ! ; I t ! t t I 1 1 i i I t t I 1 I 1 I I t I t t I t I I I I i I I i t I I i t t t i I i i I t 1 I » I I I I 1 I i t I I * I 

GTCTGGCAGAAtY^A&PGGTPGTAATTAEATCTGCCAATTTCACAGACAATGCTAAAACCATAATAGTACAGC 
7050 7060 7070 7080 7090 7100 7110 


830 370 ,900 310 320 330 340 

TGA A.CCAATC" r uT AG AA ATTAPiTTET ACAAGACCCAACAACAAT AC AAGA AAAAGT ATCCGT ATCCAGAGGG 

» I l l I t ! t 1 t 1 I I I t t I I t | t | t I t I 1 t I t I t I » t I I I I I « I I I I I I t I * I I I I I 1 « I I I I I I l I I t t I I I 

t i : • ; i i * ; i i i t : i t ; i i t : t i t t t i i i t i t i i t i i t i i i i i i t i i i i i i i i i i i i i i i i i i i i i i i i i i 

TeAACCAPTCT.iTPGOOATTPA^TnTACPAGACCCAACAACAATACAAGAAAAAGTATCCGTATCCAGAGAG 
7120 7130 7140 7150 7160 7170 7180 


350 SCO 070 380 930 lOOO 1010 1020 

GACCAGGEAGAGCATTTGTTACAAT AGGAAAAAT A6GAAAT ATGAGACAAGCACATTGTAACATTAGTAGAG 


l i t i t l 


GACCAEGGAGACvCAT TTGT1 TACAPTAGEAAAAATAGGAAATATGAG AC AAGCACATTGT AACATT AGT AGAG 
71130 7200 7210 7220 7230 7240 7250 7260 


1030 1040 1050 1060 1070 1080 1030 

GAPAPTGCPPTTUCACTTTAPPA7AGATAGCTAGCAAATTAAGAGAACAATTTGGAAATAATAAAACAATAA 


CAA AATG: -3! A AT A AC ACTTT APP AC.AGAT AEAT AGCA AATT AAGAGA AC AATTTGGAAAT AAT AAA AC AAT AA 
VTVO 7280 7290 7300 7310 7320 7330 








1 1 OO 3. 1 3.0 1 i 20 3 3 30 3 140 1150 1160 

TCTT'f ARRCRATGnT5AGGORrv'3r-:nOCCA6RAATTGTAACGCACAGTTTTAATTGTGGAGGGGAATTTTTCT 
I I I I I ! \ ! > I * ! I ! I !!»!!!! ! ! I I I » i i » i * r i I i * i i i i i i * i i i i t * i i i * i i * 

TCT'i f'RAGi :°I ; . i xm 4 IG&RfMR^jnCCCAGAAnTTGTAACGCACAGiTTTTAATTGTSGAQGGSAATTTTTCT 
•^’340 TIRO 7360 7370 7380 7330 7400 

n?i! ’160 11 CIO 1200 1210 1220 1230 

ACTGTAAT n;RPLY4CRACTGTTTRATAGTACTTGGTTTAATAGTACTTGGAGTACTGAA5GGTCAAATAACA 

( I { ! I : ! ! ! ! ! ! I ! I I i ! : i I ! ! * I ! 7 ! t t t i t t i t i i t i i i i t i * i i > i i t i > i t <•>*<*•<*•* • * • • 

ACTGTARTT.TPYUlRRAACTGTTTRATPGTRCTTeGTTTRATRGTACTTGGAGTACTAAAGGGTCAAATAACA 
7410 7420 7430 7440 7450 7460 7470 


1240 1250 2260 1270 1280 1230 1300 

CTGP.AGGRAGToiACACAATGACACTCCCATGC-AGARTAAAACAATTTATAAACATGTGGCAGGAAGTAGGAA 

i : i i ( ! ; t i t ■ ! i I i i !■ I t i I ’ i t 1 t i i i i i i i t i i i i t t t t I i i i i i i i i .. i i i 1 i t i i I i i i 

1 i i i t t i i t ; i t f i i i i t > i i i ! i t t t t i i i t i i i i t i i t i i i t i i t i t i i i i t i i i i i i i i t i t i t » i i 

CTGAAGGAAfvTGACAGAATCACGCTCCCATGCAGAATAAAACAAATTATAAACATGTGGCAGGAAGTAGGAA 
7480 7430 7500 75 3 0 7520 7530 7540 

1310 1320 3330 1340 1350 1360 1370 1380 

AAGCAATiv:Y-V r OOCCi;"CCCATCAGCGGiACAAATTAGATGTTCATCAAATATTACAGGGCTGCTATTAACAA 

i i : i t t i i t i ; t i t t t i i r t i * i i i t t t i i i i i t t t i i t i t t i i ■ i t < i i i i t i ■ i i t t i i i i t i i i i i i 

i t i i t t t i i t t i i t t t i t i t l i i t i i i i i i i i i t i t i t i t i i t i i i i i i f l ( i t i t i i t i i i t i i t i i i i i 

AAGCAATGTATGCCCC fCCGATCAQTGGACAAATTAGATGTTCATCAAATATTACAGGGCTGCTATTAACAA 
7550 '’550 7570 7580 7590 7600 7610 7620 

1330 1400 1410 1420 1430 1440 1450 

RAG'-. rGG’R-GTOATAP.CAACAATGGGTCCGAGATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGA 


GAGA T 


.GTGG' 

7630 


iiit i i i : i t : i t i i t i i i t i i i t i i t t i i i i i • i i i i > i i t i i t i i » i i i i i i > i i i ■ 

i : i i i : » i i i i i i i i t i i i i i t i i i t t i i t i i i i i • i t i i i » i t i i t i i .. 


AATRC: 


CAACAATGAeTCCGREATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGA 
7640 7650 7660 7670 7680 7690 


14-50 3.470 14-30 .1490 1500 1510 1520 

GAAGTGAATYA7ATAO; ATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAA 

i i i l i : i i i t i: t i i i i i ■ t ; i t t t i i t i i i i i t l t i i i i i i i i i I i i t i i i t t i i i i i i i i i i i i i i i i 

t « t i i : i i i * i i i i i t t i i t t i * t i i i i : i t i i i t t i t t t i i i i i i i t i i t t ■ i ■ i i i i i i i » i i i i i i i i t 

GftAGTGAATTA i ATA-V-Yf ATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAA 
7700 77 AO 7720 7*730 7740 7750 7760 

1530 15.40 1550 1560 1570 1580 1590 

GAG: GGTC43AGAGR0-AAAAAEAGCAGTGGGAATAGGAGCTTTGTTCCTTGGGTTCTTGGGAGCAGCAGGAA 

t t t t : i i i ^ i l ’ : i t : t ! i t i ; : i i i i i t i i i i i t i i i i t i i i i t i i t t t i i i t i i t i i i i t i i i i i i i t i i 

t i i i t : i i t t t i i i t : t t i i i r t f > i t i i i i i i i t t i i i i > i t i i t i i t i i t i t i t i t t i i t i i t i t t i i i i 

GAGTGETGC/-T-.AGAGAAAAAAGAGCAGTGGGAATAGGAGCTTTGTTCCTTGGGTTCTTGGGAGCAGCAGGAA 
7770 7780 7790 7800 7810 7820 7830 


1600 5.610 1620 1630 1640 1650 1660 

GCACTATG6ilX:3CACGGTCAATGACGCTGACGGTACAGGCCAGACAATTATTGTCTGGTATAGTGCAGCAGC 

i i : i i i ; t i ! i : t t t t > t : t i t i : i i i t t ) ! i i l i i i i i i l t i i f l i i i i i i i i ■ i i i 1 i l i i i i i i 

itiittttttttii t i t i t i t t t i i i i i t t t i t i i > i i i i i f i i t > i i t i i ■ > i i i i i t i t i i i i i t i i i 

GCAC fA'i'GGG3-C:iCAGGRTCAATGACGCT GACGGTACAGGCCAGACAATTATTGTCTGGTATAGTGCAGCAGC 
7S40 7350 7360 7870 7880 7890 7900 


1670 1680 1690 1700 1710 1720 1730 1740 

AGAACAA''TG5 T GAGGGCTATTGAGGGGCAACAGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGC 

1 i t i i : * I t * t ; i 1 t 1 t I t i 1 i i i i > t i t I i • i i » t I i t t t f I « t t t i a i i I I i I i I I I I i i i • I * I I i t 1 ft 
t t I i t t t i t t i i i t t i I t i i ft I I 1 i I t 1 1 I I 1 r t * t i i * I 1 I t t t * I i I I t i i t I I i I I t * I i t I I < I » » t I 


AGRRHAat FTGG i GRGGFCTATTC,RB : GGGCAACAGCATCT GTTGCAACTCACAGTCTGGGGCATCAAGCAGC 
7910 4920 7930 7940 7950 7960 7970 7980 


1450 !760 1770 1780 1790 1800 1810 

TC5.PRGC.ARGAATCC7GGCTGTGGAARGATACCTARAGGATCAACAGCTCCTGGGGATTTGGGGTTGCTCTG 

i i t r i i i t < ; : i i < i i i t i i < i t i t i t i i t i i i i i t t i i t ) < i i « i i i i i i ■ < i i i ■ t < * • • ' * ’ * i i • t i i 
t i t t i : i t i ; i t i r ! i i l i t i i i i i t ; i t i i t i i i i t i t i i l l t i i i i t i i i i i i t i i t i t l l i i i l l i i i t 

TCC^a^C^r>C?^nTCr:Tn5GCT!riTGf?HAAf?ftTPiCCTPinAGGATCftPlCAOCTCCTGGG(iftTTTGGGGTTGCTCTG 
78^0 OOOO 3010 S020 8030 8040 8050 


SRARRCTCYYT 


1850 

T^CR^CRC'i fiCl 


1840 1850 1860 1870 1880 

L:, TECGTTeSRRT 6CTR6TT6eRGTRRTRPiRTCTCTG6RRCA6RTTT6GR 


% t i i ( i t t i i t i t i t t i i i * t i i t i ( i t t t t t * t t i ) ( t i • i i i ■ i « i i r * t t » t i ... i i i i i i i i 

t I ( T * t 1 ) I t ' I • • ! I l t l t 1 ( t I 1 1 I I I t I t t t I 1 t t I » I I I I 1 I 1 t i I » I t I I I I I I I I I 1 \ 1 I ) I t I I 


GAARRCTCATTTGCAYCACTGCTGTGCCTTGGAATGCTAGTTGGAGTAATAAATCTCTGGAACAGATTTGGA 
_~5:VQ_3030_C-IORO_8100_ 8110 _8120_ 


■■•r. 


9060 




ic5-0 13CO 170 0 1920 1930 1940 1950 

PiT^r.C«TO?.C^''RES7C:fE«GTfSEG;.^fti5i^eftftftTTft^AftTTfO=tCAASCTTAATACATTCCTTAATTQAAGi 
J I 1 I r r • 1 * * ! ; ! ( t ! ! t ! : ; : I i t I ! I * t » i i • i t * i t i i i i t * i « » < * • • * ' > I * * * ' 1 ' * ' * 1 ' * • 1 * 

ATA^OftT&ftCrnae^-lt.-'BPiKTfaSli.ftCAQftQftPiATTftACftATTACACAAGCTTAATACACTCCTTftATTQAAe 
£130 0140 3130 0160 8170 8180 8190 

igp<) 1370 1 Oil: '(j 1SSO 2000 2010 2020 

067; IGCAC/Y- CC?Ai5.ttf \AC968AAGr.ATGftACAA6AATTATTSGAATT AGAT AAATG6GCAAGTTTGTGGAATT 

; ; ; ; ! I ; I I ! ! ! ! ! I ! ! ! ! I ! I I I ! ! ! I ! ! 1 ! ! I I ! ! ! ! ! ! ! ! ! < t i i i i ... » * * * » • * * * • * * i * * 

AAT 53 7C8 A AACGAGCTinEAAAr.GnATGAACAAGAATT ATTGGAATT AGATAAATGGGCAAGTTTGTGGAATT 
320 O' 8210 8220 8230 8240 8250 8260 

2030 20CO 2050 2050 2070 2080 2090 2100 

GET" V.OfVlC' 75.05"6-1 fTRi2T;T6"’X:lETATATAAAAATATTCATAATGATAGTAGGfM3GCTTGGTAGGTTTAA 

I I | ' I I ! t I ! ' ! : ! I : r I ; I : i • ; ( f I t t I I t t i I I t I I I.I I t I 1 I I » I 1 I I I I I I I I I 1 I I • • I < 

f3GTTTAAC6T6A96.ft6TT6rrC7T?TT:sGTATATAAAATTATTCATAATGATAGTAGGA6GCTTGGTAGGTTTAA 
3270 '950 8390 8300 8310 8320 8330 8340 


oi . u ) >120 2130 2140 2150 2160 2170 

P-CCt1 OP-7 7 1 TTT PO 1 7 Y7CTT TG7'CTAGTGAATAGAGTTAGGGAGGGATATTCACGATTATCGTTTCAGACCG 

\ \ j ! \ \ J \ i ’ ! | J \ ! \ \ ! , » \ , \ j t t » j i i i * i t ( t i i i i i » i * i > > * 1 t t t t t i • • i * « * > t i ' i i » i * * 

r^p^rnGTl fTTC CTG; fV VOTTTCTGTP 1 GTGOP 1 TPiGAGiTTPtGGCftGGGftTRTTCROCATTRTCGTTTCRGRCCC 

.360 8370 8380 8380 8400 8410 
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2210 


2220 


2230 


2240 


i t ( * i 


i t i t t i i t i t 


£440 8450 8460 8470 8480 

,2270 2280 2230 2300 2310 


;.o 


j ; t i t t i 


i t i t i t i t i t t i > i i i i i t i > i t i i i i i i i i t i i i i i t i i i i i 

t t i t i i i t t i ... i t t t t i i > t i i i t i ■ t i i i i t i t i 

'AGCACTTATCTeGGACGATCTGCGGAGCC-TGTGCCTCTTCAGC 
3520 8530 8540 8550 

2350 23 S 0 2370 2380 


t i ( • i t r j » * t t i t ) t i i t i t i ( t i i i 1 i i I I t l i i i i i i l l > t i i i i 

t t i r : i i t i i i t i i i i i i 1 i I i i t I » t 1 i i i i i i t t i i i i t t i t t i i t 


TPiCi OCCPOT iX-p.Gr.CACT'rACTCTTGl-'.TTGTAACGAGGATTGTGGAACTTCTGGGACGCAGGGGGTGGGAA 
SSr-.O .-ISVO 85 i :-0 3590 8600 8610 8620 

2390 2 .’O' 241C 2420 2430 2440 

GO : rc.^.r-Wr-Tf i C-f-.;Y!;:GA,--Vrr.TC:G7ACAGTATTGGAGTCAGGAACTAAAG 


gcgg rr;A6•• rr;-r. rt.-:-n•.* paatc:tc ictacagtattggagtcaggagctaaag 

8 S 30 COCO 7:050 8660 8670 X 


3. KUNZ-0158- GL.3.5 557' 

HlVFiCLCO :-l! iifipv: 1. y n.-^i 7:-o -•->r,a : ’.athy virus (MAL isolate). comp let 

ID HIVMAI .. C 1 .-" stanc ’ evo !? SNA ? 9229 SP . 

XX 

AC XO44::.0; 

XX 

DT :v- r.'C i • 1150 i nccvyorr.'ted) 

XX 

DE *-!-. n;-n i s M . p 5 c - 5 <; n 70 pathy v i. r us < MAL isolate )? complete genome . 

XX 

KW c , • i'hh:'ot 1 1 - si:cy syndrome ? env gene ? gag gene ? genome ? 

KW U ' inci . t <-?.• iii.onal i - apout ? pol gene ? polyprotein ? provirus ? 

KW ro vov ' :- o or , oo.v iptnt - o . 

XX 

OS i"" 1 ' ■ \ )7l- •*' * / i ( 1 











Retrovl r 1 dae. 


□C 
XX 
RN 
R A 
RT 
RT 
RL 
XX 
CC 

cc 

CC 

cc 
cc 
cc 
cc 
cc 
cc 

XX 

FH K 
FH 

FT p. r 

ft rrt 

FT Si : 

FT 
FT 
FT 
FT 
FT 
FT C* 
FT 

FT CU 


w M to ,r-y VS' i L: V/lR'GSC" 

C 1 1 

1 . ~on m . w© > n-HrjHcic.: i 5 , v Montagmer L. 9 Son i go F. ? 

"Genet vor iuTd 1 i ty u-f t! ,3 P.IDS virus* Nucleotide sequence 
aTiiiJ.v':-* ' T •=' : :.ul,-t,.'-,; ;roin African patients"? 

Picqtivi-T , 1 deficiency syndrcv.ie (AIDS) is caused by a 

ra< T-Livi ru>, I:.-,,- iwn by several different names? probably representing 
;. •!, .trein;- human T-csil lyvnpho tropic vlrus-HI 

< i-m. .V-; 1 ) arc’ 1 yviip?laclonopat hy-associated virus (LAV) are thought 
%.j bp i.n.. strain? end ftIDS-associated retrovirus type 2 <ARV- 2 ) the 
other, 1 ri ven viruses? whose sequences do not differ by more 
then about St;, ere believed to belong to the retroviral subfamily 
L.v. nti v ’. v . d ne ? nr " * 1 ow " viruses, For the details of the annotation 
and f 1.1 v 4 .”! he? .-art < rtit references? see the HIV reference entry. 


CHS 
r: . 


Q.'S 


A 1 

370 


4 £ ! G 


1 c t 

l S3 
1 3GV 
4S7 '< 

3 1 G4 


FT 

FT 

FT 

FT 

XX 

S© 


CGG 

RK. 

isi c ( j La w.* t i ? Vit 




v_j : ' *• “> cjC’ . * 

7G0G 3007 
8330 SCuR; 

-d hptt: 


Description 

R repeat 5 ’ copy 
5 5 LTR 

primer (Lys—tRNA) binding site 

gag polyprotein 

pol polyprotein (NH 2 -terminus 

uncertain; ft Pi at 1963 ) 

sor 23 K protein 

urfC 

tat protein? exon 2 (first expressed 
exon) 

envelope polyprotein precursor 
tat protein? exon 3 (ftft at 7960 ) 

27 K protein 
3 ’ LTR 

R repeat 3 ’ copy 


BP 31 55 fti 1627 CJ 2204 GS 2043 T; 0 other; 


Initial Score 
Residue Idant; 
Gaps 


936 Optimized Score 

3 t t ( y i c!-' J i Q S 


2041 Significance 
2066 Mismatches 


Cnnscrvative Substitutions 


X 




30 


40 


50 


60 


0 . 00 
349 
0 

70 


ATGftGAG' VGA, 1 ; Lf 'iift, vft'l iTfCALCACTTGTGGAGATGGGGGTGGAAATGGGGCACCATGCTCCTTGGGATA 


ft 7 boTAG 7 'in--> opturACfT-GOLBiftATTATCftAAft—CTGGTGGftGftTGGGGCftTGftTGCTCCTTGGGATG 
5800 T:. ■ I ;-J 20 5830 5840 5850 5860 

SO SO 100 UO 120 130 140 

TTGTVdATd Of ■ ftffT . 57 ft Oft GT.ftftftft'l TGTG.GGTCftCftGTCTATTATGGGGTftCCTGTGTGGAftGGftftGCft 

1 t 1 t » t l t • i t 1 t ? r t 1 : 1 t t r 1 1 1 t 1 1 1 1 1 1 1 1 1 i 1 t 1 1 ( 1 * t f 1 » 1 « 1 * • t 1 1 ****** 

1 1 1 1 t 1 t 1 ■ 1 t 1 t 1 : 1 1 t t 1 t t t t t 1 » 1 1 1 t 1 1 1 1 1 1 t t t t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ****** 

TTGA flSftC:: IT. 1 ftGTfVi TGCAGAAGATTTGTGGGTTACAGTTTATTATGGGGTACCTGTGTGGAAAGAAGCA 
5870 TOtO 5890 5300 5910 5920 5930 


Us: 170 180 190 200 210 

f ;'; 'G TGC ATdP.G ATGCT AA AGCAT ATG AT AC AGAGGT ACAT AATGTTTGGGCCACACAT 


i t 1 1 l l 


t 1 l l 1 l 1 t 1 l 1 


ACCftCTAO fen iT'f - G'ie-iGAT oo.gatgctaaatcatatgaaacagaagtACATaacatctgggctacacat 

5940 f-rro 5980 5 S 70 5980 5990 6000 6010 


;>~l. * 

GO'.ho.rt 

T * I ! F 

LO. ! • 


'••30 240 250 260 270 280 

1.Cl li .-(CSGCitPv ,nCAC:A asaagtagtattggtaaatgtgacagaaaattttaacatgtgg 

I ? 1 t f 1 I I I I I I I t I I I t I I I * t III !*<*** t * t * I * <****••****• 

• IT T I ; I I I I I I T t I t I I 1 III I III I 1 t 1 I 1 1 I I I 1 1 I * 1 1 I 1 1 I I 1 1 1 

dP " lO.'-'CCCC'PrcrjCrlC AAGAAATAGAACTGGAAAATGTCACAGAAGGGTTTAACATGTGG 
8030 9040 SOSO 6060 6070 6080 


▲ 







230 i IO 320 330 340 350 360 

'■<*! I -r;iTf>l inPiCOGm t:CRTGRGCRTATRRTGRGTTTRTGGGRTCRRRGCCTRAAGCCRTGTGTA 


I I * : • ' I * * * * ♦ r I ’ 

AA -)r\{ i'l AAOATv **7iT* r r^ inG 1 
COSO fli^O 


A^AYCCATbAGSA'rATAATCACiTTTATGGGATCAAAGCCTAAAACCATGTGTA 
r.iiO 6170 G130 6140 6150 


APAT rp,'. 


TJAC 


ttc" 


FT 


330 400 

AAPGTGCAC1 GATTTG- 


410 420 

-GGGAATGCTACTAAT-ACCAATACT 


, . f : i t i ; t : r i . : i I < t t t : t • r I i I I t I I I i * I I i I I I I I ill i i I i i i » 

npr'r : r.r.o:'.err. r rtgt cr.c: TTRnRCTGCRCTRftTGTGRRTBGQRCTGCTGTeRRTGGeRCTRRTGCT 
si co r.i t-: s ?.;:o siso 6200 6210 6220 

■■•AO ^f-tO 460 470 480 490 

067; V V!Y-'li: HJfV4 06 ?''6 f Oil .U, j. .1 vRR^TGriTGRTEiGRGRRRGGRGRGRTRRRRRRCTGCTCTTTCRRTRTC 


GGGRI-iT-fV-Ti OC:.j RCTRRTIGG^RRTTGRRRRTGGRRRTTGGRGRRGTGRRRRRCTGCTCTTTCRRTRTR 

6230 n.'r r ‘AO 6260 6270 6280 6230 


500 5’.C- 

ReCf.CRO'L:N6 '-'f-'S 


620 530 540 550 560 

fOOl'-iO 1 TiCRCRRRGRfVT RTGCRTTTTTTT RTRRRCTTGRT RT RRT RCCRRT R- 


, ; | , (II j : * t : i I * ; i i i i i I i i i i i i i t t i i t t i i i i i i i i i i i 

RCT»'( :Ru-"! ‘REV f RRG ; 7 .RTRRRfT 'CG-RRGRRTR7 '(SCRRCTTTTT RT ARCCTTQRTCT ASTRCA ARTflGAT 

6300 6330 6340 6350 6360 


570 6"<: '7 

GRTfVYlfGR T OCO (-K'- 7V! 

Gftl i-.l 7 ! I LO-. t i- 5 '-'. i i ’■!>.? ; -.13 I T •-) 11 

63 ?u r 


GOO 610 620 630 

V TTGRGRRGTT{5TRRCRCCTGAGTCRTT RCRCRQSCCTQTCCRRRGGT A 

• t t * 1 1 » t t 1 1 1 1 t 1 t t 1 t t 1 t 1 1 1 1 1 t 1 1 1 1 1 1 1 » 1 1 1 1 1 1 

• ! J t 1 f t * | , f tlltflll I t 1 I | | I 1 I 1 I I I I 1 t I f I I I I I 

:L-G TRRTRRRTTGTRATPiCCTCRGTRATTRCRCRGGCTTSTCCRRRGGTR 
7 0 mOO 6410 6420 6430 


640 650 SoG S70 680 690 700 710 

TC : I i'TOOi rG.YYtT’ ;L-CATAC'-" 1 1 RTTGTGCCCCGlGCTGGTTTTGCGRTTCT AA AATGT A AT AAT AAG ACG 

i ; - t l t • - ; ’ 1 1 1 t t 1 ; t t ! 1 1 i ( 1 1 1 1 1 i 1 (iitiiliiii ( ( 1 ( ( 1 1 ( it|((( iiii(l 1 

i : > : - ' 1 1 1 t t : t : 1 t 1 I t t 1 1 > 1 1 1 1 ( ( I 1 1 t t 1 ( »***(«*( ***«*< 1 1 1 ( I 1 1 

ACG v • ri.-! oOCA <ftG.y, TAT ! GTGCCCCRGCTGGTT'T TGCA ATTCT AA AGTGT AATGAT A AG AAG 
6440 3-7 . ) G- : 6 > 6470 6480 6490 6500 


720 

it;:' RfCf-.ORC 


T'l ^ 
6510 



730 740 750 760 770 780 

: CATL . •••' '' AATciT CAGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTATCA 

• t 1 : t • i ( 1 t 1 1 1 1 t ... 1 1 1 1 1 1 t 1 1 1 1 1 1 1 11 tit 

• • 1 t 1 t 1 t 1 * j 1 1 1 t 1 t 1 1 1 t i 1 ■ 1 1 1 1 1 1 t t t 1 1 1 1 1 1 1 1 1 11 111 

■ .TGVi LG OA m ,Ah7u.TC AG I’AC AGT AC AATGT ACACATGGAATT AAGCCAGTGGTGTCA 
G530 £540 6550 6560 6570 6580 


7 :;J 

ACTOAA I OC V 


ACTGY-VACT65T; 


v.yU 810 320 830 840 850 

fv * : -WGw 1 , <37AulJAiTAAGAA.GAGGTA6TAATTAGATCTGCCAATTTCACAGACAAT 

; ( 1 ( 1 ( t • t 1 1 1 t 1 « 1 1 » 1 1 1 1 1 1 11 t ( 1 1 1 t * 1 1 ( 1 (it I 1 1 1 * 1 1 t 1 1 1 

1 1 1 r t t 1 ; ■ 1 i t ! 1 : * • 1 1 t ; 11 t t t 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

f'-,’'.A 030043 • 1 rrAf- GAGAAGAARAGATAA7 GATTAGATCTGAAAATCTCACAGACAAT 
SCO £310 £520 6630 6640 6650 


I3CTAAAAU!'- v T .-‘RTA'-'TRC<Mt! 


111:1 

t()»:* 




:0 880 900 910 920 

•< ’! OACC ;n ATGTGT AGA A ATT AATTGT AC AAG ACCC AACA AC AAT ACAAGA 

r rt It lilt) 1 t 1 f t » i t I 1 I I I 1 1 II ■ I I I I I I I I 1 I I 

It It I I 1 » 1 t I I I I I l I I I I * 1 I » 11 I I I 1 I I I I t I I I 

: * i aataaa actgta acaa't taattgtacaaggcctggaaacaatacaaga 

3570 6630 S7GO STIO 6720 


33G :• - O 

AAPA.GTAA r:C5TAT 

aga , :;;33.''/i acat tt 

67-0 770 




360 670 380 330 

ICCAbiGG AGAGC ATTTGTT ACAAT AGGAAAAAT AGGAAAT ATGAGACAA 


x: 


AiAGGCA; AGCACTCTAT ACA AC AGG3AT AGT AG6AGAT AT AAGAAGA 
7.750 67RO 6770 6780 6730 


1 OOO ; O ' 1 7 'i} 

f 5 CA<;;r.:TTn \ ‘ * 'ATTAf■?iTA , . ;A! 

ii) * * t i t * i i f i i i 

l*t *1*111 • * ' \ ' * 

GCATfYTTi-.i ) AV/i'fVi'": rf\\i^.'Y-y; 

^.7 10 


c 030 1040 1050 1060 1070 

: 7 AAA* i W : A ATGt.tf ACT 1 T AAA AC AG AT AGCTAGC A A ATT A AG AG A AC A A 


:';FAA1Cf5:7ATAAAACTTTACAACAOGTA6CTGTAAAACTAGGA*A6C“" 


AC:":o 


6840 


6850 


6860 








•.110 1120 1130 1140 

J—'-VrO JTC AGiGAGf.viGGACCCAGA AATTGTAACGCACAGTTTT 


I I I I I I I I I t t I I I 


It I I I I I 


-*!"■ rO (f> ' '"* * ' A 4 


r npff ! TTTRATTCATCCTCAGGAGGGGACCCAGAAATTACAACACACAGTTTT 
> 5!-aO 6300 6310 6920 6930 


"it .. I Vi • | 


A A! '< i-vr,is| :(., -I- • j 

r-.M-i.fi ? ■. v“i 


I c i .'mv rr ; 


1100 1190 1200 1210 

-‘ATTCnACACAACTGTTTAATAGTACTTGGTTTAATAGTACTTGG 


VU' r:'i: ITGTAFVrACATCAAAACTGTTTAATAGTACATGGCAGAATAAT-GGTGC 
RSP/i 6970 6980 6990 7000 


1 2m:0 
A07 ' -: - 


L7'-'0 1250 1260 1270 1280 

rA'.-f. : I nAPMRARCYnYACACAATCACACTCCCATGCAGAATAAAACAATTTATA 


i f f I l I i I i 


iY-U ;CA. ; ,9f-;S I CAACTGGT AGT ATCACACTCCCATGCAGAAT AAAACAAATT AT A 
707? 7030 7040 7050 7060 7070 


AADATGTPV 


1320 1330 1340 1350 

-o ! -r.,/>;:. j ^-f-r^vA70iTATGCCCCTCCCATCAGCGGACAAATTAGATGTTCATCAAAT 


AAT'Yi AY 


' R|- V .b.67,''.VGTA reCC'.CCTCCCATCGCAGGAGTCATCAACTGTTTATCAAAT 
•T/90 7’00 7110 7120 7130 7140 


1360 * 

ATTm ■' 


1.300 1390 1400 1410 1420 

"^: , *3Rr:l':'?. ; LA7'- i r-lTGi.iTAftTAACA-ACAAT—GGGTC—CGAGATCTTCAGACCT 

rililft'il iiiili ill! I (lilt I I I lilt lit I I i t I I 

I : • ■ , i i ■ 1 • t , I ! I 1 1 * 1 ! 1 I I t 1 1 I I I I t I I III I I I I I I 

,V,h._.R;V O, ,tVr-|T;..G7GC!AAA [ AGTAGTGACAA1 AGTGACAATGAGACCTTAAGACCT 
1C TIT;.; 7! 80 7190 7200 7210 


1 <7 if.' 


1460 1470 1480 1490 

-7 .SA A(-iTGAATT AT AT AAAT AT AAAGT AGT AAAAATTGAACCATT A 


GGAcGAGGAGA": f V; ; • ■ \GGGA.f if VY:" i bib,;' Yf AAGTG AATT AT AT AAAT AT AAAGT AGT A AGAATTGAACCCCT A 


V- -'o 


7260 


7270 


7280 


1500 1: .10 1520 1530 1540 1550 1560 

RG*i : r ~ r ’-^ oy . r . c & cn .'''gaagagtggtgcagagagaaaaaagagcagtgggaataggagctttg 

• ; ; . I ! ; I I ; ; I : l : : I I : I I I . I I ! I I I I » I I I I I I I I I .I I I I I II 

t , i < I t i i : ; > - ' : I i 1 ' : : t r i I i t : f i t I t I I t | i i i i I I i I t i I t I I I III i I i i I I I I i 

RGAiT:'; •AGOA'.J-: -0C 3 ,AG6t77 >91 fAGArtGAS' l GG7G GAA.-iGAGAAAAAAGAGCAATAGGACTAGGAGCCATG 
‘707, • .r;V- 7V,,. 7320 7330 7340 7350 


1570 


‘ ■ 5.590 1600 1610 1620 1630 1640 

:pi r.iL-AAGCACTATCiGGCGOACGGTCAATGACGCTGACGGTACAGGCCAGA 


111 till! 


TYO, I"l iv 1 C;'. • TTc-C-GAGCAGX. A6GAAGCACGATGG'GCGCAGCGTCACTAACGCTGACGGTACAGGCCAGA 
7360 Y ;, '7i. 77'90 7390 7400 7410 7420 7430 

j. GOO ; 660 1670 1680 1690 1700 1710 

CAP:" 'i-A-'- i or-'VO-’-v'.Tr'Tr riOYOCVAGnAGAACAATTTGCTGAGGGCTATTGAGGCGCAACAGCATCTGTTG 


CAY'! T. 


KG i -••TAGlG' ::.' ' '.CAGCAAAACAATTTGCTGAGGGCTATAGAGGCGCAACAGCATCTGTTG 
7450 7460 7470 7480 7490 7500 


Ch' . TOY! 


1740 1750 1760 1770 1780 

11 tlGCATC: .OCnAGCTCCAGGCfAGAATCCTG6CTGTGGAAAGATACCTAAAGGATCAA 


QO;’..;, ! •; . -- i :t Y'-T 1 vfC.-YYiCATTA Mpi 1AGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGATACCTACAGGATCAA 
731.1 •; 20 7530 7540 7550 7560 7570 


it 10 1820 1830 1840 1850 

0190 r ":~ "rCTivGAAAACTCATTTGCftCCACTGCTGTGCCTTGGAATGCTAGTTGG 


i t i i i i i i i 


CfciM., 7,'Y 0 AG; >' - .ATt 


:TT-: i. fCT o.G A A AAC ACATTTGC ACCACATTTGTGCCTTGGAACTCT AGTTGG 
OO 7610 7620 7630 7640 







I860 J £ 7u 1880 1300 1910 1920 

AGTf-7-Y? ARA’I CO f :■iGuARCRi ’ft f VI GGR ATARCATGACCTSGATGGAGT6GGACAGAGAAATTAACAATTAC 

( t * t t i t t i t ■ i t : l t r i t t t t i i i i > i i i t i t i t i t ( t i • ( t t i t ■ I t t f I t I t i i i i i i i 

11111*1 * 1 * t * * 1 * 1 i I 1 I ’ t I t t 1 1 t I 1 I 1 t t I t t I t I I t I I I t t I t I 1 I 1 I I I I I 1 t t I 

AGTAATRGATC n CTOi ;ATGAt>Yi 1 TGGAATAATATGACCTrGGATGCAGTGGGAAAAAGAAATTAGCAATTAC 
7650 TcEJO VbV7> 7680 7690 7700 77 10 

1930 1940 1 950 19SO 1370 1980 1990 2000 

RCOOOt-Tr : 0 . 0 ^*'' ''V. L 2 T 1 /'.AT GAP.r^ATnGCAAAACCAGCAAGAAAAGAATGAACAAGAATTATTGGAA 
II) t t ; i I ; i : t ; t i i > t i i i i t t ! i i i i i t i i i t i t t » i i t i i i i i i i t i i ■ t t i i i i i i 

ACAGGGRTAftTRTACAACTTRRT- r SAA6ARTCGCAAATCCAGCAAGAAAAGAATGAAAAGGAATTATTGGAA 
7720 ',’770 "74~ 7750 7760 7770 7780 7790 

20.1.0 ,:030 2030 2040 2050 2060 2070 

TTRGRTPmTG f .-.-f5iCRRGTT'i'l-.i TGGAATTS6TTTAACA7ARCAAATTGGCTGTGGTATATAAAAATATTCATA 

i : it ; i * ; t i r * t • * i : • t r i t * r t t i t i t t i i t i i ) t > t i I i i ..* « » t i I t i I i t t I I 

TTGGRCRRlr T,. vr ViCRRiTi ri Tl ’ i OGRRTTItGTTTR5CRTRTCRRRRTGGCTGTGGTRTRTRRGRRTRTTCRTR 
7800 7810 7830 7830 7840 7850 7860 


2080 20. JO 2100 2110 2120 2130 2140 

RTGRTPGVOL' OOnO, V! TGG! AG'“ ’ 7 TAASRA TRGTTTTTGCTGTRCTTTCTRTRGTGRRTRGRGTTRGGCRG 

11 : 1 t J l i t ; t I 1 I 1 i i 1 I t i i i I 1 I t ( i 1 I 1 1 I I I I I 1 I I I till i i i I I I i i i i t i I I t 

ATAGTAG 1 PET OGG57 TROT RF:3 : TTAAGAATAATTTTTSCTGTGCTTTCTTTRGTRRRTRGRGTTRGGCRG 
7870 "'SO 7830 7SOO 7310 7320 7330 


2150 r-llbc 37 70 2180 2190 2200 2210 

GGATATTCACrAT TO I'CGI"iTCAC :ACCCACCTCCCRACCCCGAGGGG-RCCCGRCRGGCCCGRRGGRRTR 

11*11 I t t 1 f • I t t T t * • I t t 1 I I I I t I I I t 1 I I 1 1 I I I I 1 t t I I I I I I I I I I I I I I I I I t I 

^p^l^CTCnCClT^rrtrrCtFrrii-cnt-.RCCCTCCTGCCPiRCRCCGAGIiOGPiCCnCCCGACAGGCCCGPiPiGGftPiTA 
7340 7330 7 330 7370 7380 7330 8000 


2720 


'5 /M‘'o 

!# t t-lLTI . - » It-.li" 


2740 2250 2260 2270 2280 

AGRSAGACAGATCCATTCGATTAGTGAACGGATCCTTAGCACTTATCTGG 


i i i i i : : t t : j : r * j i ( t 

iiiiiiiiilirti.iii 


GAAGAAGAAG37 GGAGAGCA' 
8010 3020 f. 


till J ( l i l l l i i i i t l l i i 1 l i t i i i i ii i i i l I i i t | i i i | 

iiii t t t i t t i t i t i i t t i t i i i i i i i i it i i i i i i i i i t i i i 


.AGAGGCAGATCAATTCGATTGGTGAACGGATTCTCAGCACTTATCTGG 
> 8040 8050 8060 8070 


2290 2300 2310 2320 2330 .2340 2350 

GRCGATCTGC.GVAur.'.rrTV-Jj.lCTr.TTCRGCTRCCACCGCTTGRGAGACTTACTCTTGATTGTARCGAGGRT 

I I t I * lit t t t 1 r 1 I I I i t I I i t t I t 1 > t t i 1 t t 1 I I ( I I t t I • t I I I I I I lilt l i i i i i i i i 

l l l l I t t I tit it t t i i t i i i i t i t i I 1 i t t i i t t 1 I I t i t I t I I I ■ I t I ■ t till I 1 I i I I i 1 I 

GACGACCTGALvC,:RAr:rj—TCi'l GCCi iJTTCAGTTACCACCGCTTGAGAGACTTACTCTTAATTGCAACGAGGAT 
8080 8030 3100 8110 8120 8130 8140 


2360 2370 2: .80 2390 2400 2410 2420 2430 

TGTGGRACTTC-fGGOOOGCAGGGr.lGTGRGARGCCCTCARATRTTGGTGGAATCTCCTACRGTRTTGGRGTCR 

i i i t i i < i i i i i •[ i i t t i t i t l i i i i i i i t t t i t t i i i i i i i i t i i i i i i i i i i t i i i i i i i iiii 
i t t t i i i t : r t i i i t :: i i i i i i t i i t t i i i i ■ i i i i i ■ i i t i i i ■ i i i i i i i i i i i i t ( i i t i i i i 

TGTGGAACTTClGGGACGCAGGuGGTGGGAAGCCCTCRAATATCTGTGGAATCTCCTGCARTRTTGGGGTCR 
8150 8ISO 8170 8180 8190 8200 8210 8220 


X 

GGAACT A RAG 

I I I I I « 111 

I | 1 | I t lit 

GGAACTGFlAG 

8230 


4. KUNZ-15S-CL33. SEE 

HIVELICG Human 1 vrnphB^ lonopBthy virus (ELI Isolate) i comp let 

ID HI VELICG stancteT F.NA5 3176 BP, 

XX 

AC X04414 9 
XX 

DT 17-OCT-1386 ( incorprrr; ted) 

XX 

DE Human 1ymphadenopndry virus (ELI isolate), complete genome. 
XX 










KW acquired immune deficiency syndrome; env gene; gag gene; genome; 

KW long terminal repeat; pal gene; polyprotein; provlrus; 

KW reverse transcriptase* 

XX 

OS Human lymphadenopathy virus 

OC Viridae? ss-nnvaiupad viruses; Retrovirldae. 

xx 

RN Ell (bases l.~s;.76> 

ra Aiizon Mo * wain-Hobson S. 9 Montagnier L. , Son 1 go P. ; 

RT "Genetic voru-ibi Uty of the AIDS virus- Nucleotide sequence 
RT analysis of tv;.-.? tacietes from African patients"; 

rl ce 11 4G ■ g::-v4 < 1 sun > ■ 

XX 

CC Acquiree: 1 mmune deficiency syndrome (AIDS) is caused by a 
cc retrovirus !o :own by several different names* probably representing 

CC two separate. -:t-/.-cins• human T-cel 1 lymphotropic virus-III 

CC (HTLV-III) end lymphadenopathy'-associated virus (LAV) are thought 
CC to be one strain* . aids- associated retrovirus type 2 (ARV-2) the 
CC other* Al I three v • ruses * vrt lose sequences do not differ by more 

CC than about r-%;, are eel t©ved to belong to the retroviral subfamily 

CC Lent 1 vir 1 t!.-:.e* or "•-•.low" viruses:: For the details of the annotation 

CC and for other pert • nrnt references, see the HIV reference entry. 

XX 


FH 

FH 

Key 

r~ v c r Mi 

~r> ■ 

Doscript1on 

FT 

ppt 



R repeat 5’ copy 

FT 

RP T 

i 

1 30 

5’ LTR 

FT 

SITE! 

.* 3’ ■*. 

> o'*. 1 

primer (L.ys-tRNA) binding site 

FT 

CDS 

... ' S‘S 

* . * S " 

gag polyprotein 

FT 

FT 

CDS 

1 Z'.y .'0 

.'I’.'.iP! 

pa1 pa1yprotein (NH2-terminus 
u.ncertainr AA at 1904) 
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•;:40 330 3SO 970 980 990 IOOO 
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TFaftAGEftAGCft 


:>pp. 


150 160 170 180 130 

"ACn^CTCTATTTTG-TGCATCASATBCTAAAGCftT-ATQATACAQAQQTftCRTA 


GFFiAACAfV 7C<>1 
1010 


:..;t c-ojta 1 

i 070 


r^j 

w 


ASftTAATGGTGATTATTCAGA—ATTGGCCCTTAATGTTACAGAAAGCTTTG 
1010 1040 1050 1060 1070 


200 210 220 230 240 250 260 270 

ATG'l TTGGGCCACftCATGCGTGTGlT ACCCACARACCCGAACCCACAAGAAGTAGTATTGGTA—AATGTGACA 


PTi' 


- ~-;l '/VPP 


n T- -A( KVXr 


- ■ ftACARAAC-AGGGAATAGAGGACGTATGGCAACTCTTTGAGA 









1080 


1030 


1 1O0 


1 1 10 


1 120 


1 130 


WOO 2SO 300 310 320 330 

-GftftftftTT rrftftCATGT£GAAftft~---ATGA-CAT-GGTAGAAC-AGATGCATGAGGATATAATCAG-TT 


I l 


t t l 


■ t 


ij- r riPiTr.r>fmy- tt-tc: rsT AftAATT atccccatt atscatt act atgagatgcaat aaaagtgagacagat ft 

H-U_! 11.50 1 160 1170 1180 1130 1200 


3^0 .350 360 370 380 330 

TAT- GSGA f-f'ft—AAGCGTftftft—GGCATGTGTftftftftTTAACCCC—AC—TCTGTGTTAGT—TTAAAGTGCA 

! ' 1 ! ! ! ! !'!!!!!' ! ! ’ ! ! < ■ > ■ • • • ■ • • * * • ■ ■ ■ 
GATGGGGA T'TGfti IftftftftTCft ,T:ftftCftftCftATAftCAACftGCAGCACCAACATCAGCACCAGTATCAGAAAAAA 
1210 1770 1730 1240 1250 1260 1270 


■ uO 410 420 430 440 450 460 

CTO ,77'TTGGL-'Gft ATGCTACTAATACG AftTACT AG-TAATACCAAT AGT AGT AGCGGGGAA—ATGA—TGATGG 
\ • I ! I • \ ' I till ! ! ! I t » t i » » ii< i i it i i t i i ii i * 

rAGACATGGTCARTKAGACTAGTTCTTGTA-TAGCTCAGAATAATTGCACAGGCTTGGAACAAGAGCAAATG 
1280 17 r -:(- 1300 1310 1320 1330 1340 1350 


470 480 490 500 510 520 530 

ftGftAftGGft'nftRATftftftftr.ftCTGCTCTTTCftATftTCAGCACAftGINATAAGAGGTAAGGTGCAGAAAG—A AT A 


li i i I i I 


ATPA0CTG I ft Aft 1 TijACCATGACAGGGTTAAAAAGAG-ACAAGACAAAGGAGTACAATGAAACTTGGTACTC 
1340 1370 1380 1330 1400 1410 1420 


540 550 560 570 580 530 600 

TOO- --ATTT-TT TTATAAAC1 TGATATAATACCAATAGATAATGATACTACCAGCTATACGTTGACAAGTTG 


TACftGATTTGrn 
1 430 


FTGTGAAC—AAGSEftAT AGCACT—G AT AATG AA AGCAGATGC—T ACAT A AATC A—CTG 
1440 1450 1460 1470 1480 


SiO 620 G30 640 650 660 670 

TA-f-'CACCT' 'lAGTCATTACACAGecCTGT-CCAAAGGTATCCTTTGAGCCAATTCCCATACATTATTGTGCCC 

i i t i i it ii ii » it i tit lit* i i i * * * iii * i i i i i i i i i 

i i i t : i it ti ii i it t lit i i t i i t i * * i iii t i i i i t i i i i 

TAACACTTCTGTTATCCftAC-iftGTCTTGTGACAftACAT—TATTGGGATACTATTAGATTTAGGTATTGTGCAC 
14S0 1500 1510 1520 1530 1540 1550 


S80 630 700 710 720 730 740 

CGRGTSST TTT ciCGAI TCTAftftftTGTAATAATft-AGACGTTCAATGGAACAGGAC—CATGTACAAATGT—C 

i i i i i t i t : i i t i t i i t t i : : i i i it i i i i i i i i i i t i 

i i till tit i ti t iiititt i i ( i i i i * i i (til l ii ti 

CTCCAGGI TA'I GCT7 1 GCTTA6ATGTAATGACACAAATTATTCAGGCTTTATGCCTAAATGTTCTAAGGTGG 
1560 1570 1580 1530 1600 1610 1620 1630 

750 760 770 780 730 800 

AGCACAGTACAATGTACACATGGft—AT—TAGGC—CAGTAGTATCAACTCAACTGCTGTTGAATGGCAGTCT 

t i t : tit i i ■ i iii ii ii i iii i i ii iii i ti i i i i i i i 

> i t t I ; I i i i i III II ti I III i f II III i it i i i i I i i 

TG57CTCTTC-ATG—CftCAAGGAT6AT GGAGACACAG—ACT—TCTACTTGGTTTGGCTTTAATGGAACTAG 
1640 1550 1660 1670 1680 1630 


810 670 830 840 850 860 870 

AGCAGAAGA- AGAG.GTAGTAATTAGATCTGCCAATTTCACA6ACAATGCTAAAACCATAA—TAG—TACAGC 


AGCftGftAftftT AGAACTT AT ft TTTA CTGGC—ATGGT AGGGAT AA T AGGACT AT AATT AGTTT AA AT A 

1700 1710 1720 1730 1740 1750 1760 


880 SSO 900 310 320 330 340 

TOAACCAATCTGTAGAAATTAATTGTACAAGACCCAACAACAATACAAGAAAAAGTATCCGTATCCAGAGGG 


AGTATTATAATCTAACAATGAAATGTAGAAGA-CCAGGA-AAT-AAG-ACAGTTT-TA-CCA-GT 

3.770 1700 3 730 1300 1810 1820 


950 


SBO 


370 


330 


330 


lOOO 


1010 


GACC'ft- GGGAGAGCftTT -TGTTAGAATAGGAAAAATAGGAAATATGAGAC—AAGCA-CATTGTAACATT 

CACf. .YVTTATGTC TGGATTGGTTTT CCACTCACAACCA—ATCAATGATAGGCCAAAGCAGGCATGGT-GTT 





18-50 


1840 


I860 


1870 


1880 


;.020 1030 1040 1050 10S0 1070 

AG—TAGARCAAAATGCAATGCCACTTTAAAACAGAT-AGCTAGCAAATT-AAGAGAACA-AT-TTG 

! I ! I ! ! ! I I ! I 't ! I I I t I I I * * i * * * * 1 • ■ * ■ 11 11 11 

□GTTTGGAGfSAAAATGGAAGGATKCAATAAAAGAGGTGAAACAGACCATTGTCAAACATCCCAGGTATACTG 
1830 1300 1310 1320 1330 1340 1350 1360 

1080 1030 1100 1110 1120 1130 1140 

Eflp^TAATAAARC- AAT—AATC—TTTRAGCAATGGT—CAGGAGGGGACCGAGAAATTGTAAGGCACA—GT 

; ; ; ; ; ; ; ; ; ; ; ; !!;! iiiii ;!!! ;!;!;! !! ; ; ! i ; ! > ■ • > > ■ ■ > 

□AACTAACAATACTGATAAAATCAATTTAACGGCTCCTGGASSAG6AQATCCA6AA-GTTACCTTCATGT 

1 370 1380 1330 2000 201O 2020 

1150 1160 1170 1180 1130 1200 

—TTTAAT fGTGGAGGGGAATTTTTCTACTGTAATTCAACACAACTGTTTAATAGTACTTGGTTTA—ATAG 

t t ; t i i i : i ii it i i i i t i i t t i it ii ii ti ii i * ill i lilt 

i i i i i tit* ii ii i i i t i i i i i t it it ii ii *i * i iti i i i i i 

GGACAAATTaCAGAGGAGAGTTCCTCTACTGTAA-AA—TGAATTGGTTTCTA—AATTGGGTAGAGGATAG 

2030 2040 2050 2060 2070 2080 2030 

1210 1220 1230 1240 1250 1260 1270 1280 

TACT TGGAGTACTGAAGGGTCAAATAACACTGAAGGAAGTGACACAATCACACTCCCATGCAGAATAAAACA 

i ttit i > i i i i i iii til ii i i i i i t i t t i i i i t i i 

i i t i i i i i i i i i ill ill ii i i i i t t i t i l i i i * * i 

GEATGT AACTACCCAGAGGCCAAAGGA—AC-GGCATAGAAGGAATTAC—GTGCCGTGTCATATTAGACA 

2100 2110 2120 2130 2140 2150 2160 

1230 1300 1310 1320 1330 1340 1350 

ATT'i ATAAACATGTGGCAGGAAGTAGGAAAAGCAATG—TATGCCCCTCCCATCAGCGGACAAATTAGATGT 

t i ii titt t i t i i i i i i i i i iti till iii i i i i i i iii i i i iii 

I ! II tilt till! ! : I I I I I III till III I I 1 I I I III I I I III 

ART AATCAACACTTGGCATAAAGTAGGCRAA—AATGTTTATTTGCCTCCAAGAGAGGGAGACCTCACGTGT 
2170 2180 2130 2200 2210 2220 2230 

1360 1370 1380 1330 1400 1410 1420 

l'CA'1 CAAft r.ATTACAGGGCTGCTATTAACAAGAGATGGTGGTAATAACAACAATGGGTCCGAGATCTTCAGA 

■ it : i i iii i i it t t t t t i iii i t t i i i i i i ii 

i i : iii tii it it i i i i i i iii i i i i i i i t i tt 

AAC’iCCACAGTGACCAGTCTCATAGCAAACATAGATTGGACTGATGGAAACCA-A ACT AAT ATCACCATG 

2.240 2250 2260 2270 2280 2290 2300 

1430 1440 1450 1460 1470 1480 1490 

CCTG—GAK3ARK-AGATATGAGGKACAATTGGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAA 

ii tilt ti lit i i i i i i t i i i i it i i i i t t i i i t i l i i ii 

ii tit: ii iii i t t i i t i i i t i it i i i i i i i i i i i i i i it 

AGTGiCAGAGGTC-GCAGAACT—GTATCGATTGGAGTTGGGAGAT-TATAAAT-TAGTAGAGATCACT 

2310 2320 2330 2340 2350 2360 

1500 1510 1520 1530 1540 1550 

CCATTAGGAGTARCACCCACGAAEGCAAAGAG-AAGAGTGGT—GCA—GAGAGAAAAAAGAGCAGTGGG 

ii i it i ti t i i i i t t : i i i i t i iiii iii i i t i i ti ii ii iii 

it t ii ii t i t i i * t t i i i i i i iiii iii i i i t i it ii ii iii 

r:Cc:ATCGGCTT2GCC:CCCACAGATGTGAAGAGRTACACTACTGGTGGCACCTCAAGAAATAAAAG-AG-GGG 
2370 2380 2330 2400 2410 2420 2430 
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AATAGGAGCTTTGTTCCTTGGGTTCT-TGGGAGCAGCAGGAAGCACTATGGGCGCACGGTCAATGACGCTGA 

: i tii i t i i i i i i t t i i t i i tiiit i i i i i i i i i iii t i i i i i i i i 

i t iii i i i i t : ; : i i i t i i t i i i t i i » i t i i i i i tii i i i i i i i t t 

TCT1TGTGC r RlI;66TTCTTGRGTTTTCTCGCAACGGCAGGTTCTGCAATGGGCGCGGCGTCGTTGACGCTGA 
2440 2450 2460 2470 2480 2490 2500 

1830 1 F'.’"0 If 3 50 1860 1670 1680 1690 

CGGTACAEi boGAI 3ACAATTATTE7CTGGTATAGTGCAGCAGCAGAACAATTTGCTGAGGGCTATTGAGGCGC 

i t iii ii • i i t i i t iiii i i i t i • i i i i i iii i i it ii i i ii i 

i t tit ti i i i i t iiii • ■ t > t i i t i i i til > i ii ii i t ii i 

CCGCTCAGTCCCGGACTTTATTGGiCTGGGATAGTGCAGCAACAGCAACAGCTGTTGGACGTGGTCAAGAGAC 
2510 2520 2530 2540 2550 2560 2570 
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AACAGCATCTET fGCAACTCRCRiTfCTGGGGCATCAAGCAGCTCCAGGCAAGAATCCTGGCTGTGGAAAGAT 

titi i t : t t i t tii it I t I t t i i 1 I ill I I t p i I I t II t i it * ii * * 

itrt : : : i i t i ill ■ : i I t t i i i t i i i i i i i i * i i t i i t t i * * i i t i 

AAi7V-M~AA rv (; 'Gi -IOOTi .i^rinrnTITGRRGRAf lARAGAAOCTCCAGACTAGGGTCACTGCCATCGAGAAGT 
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lAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTGCTGTGCCTT 

. . , « , t ...... i • . • •! * . t i lilt . ■ i i .* . 
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RC7TAAAG.7Rr:5nrJc:raCA6CTGiAATACTTGGGGATl.i;TGCeTTTAGACAAGTCTGCCACACTACTGTACCAT 

2 SGO 7570 2630 2690 2700 2710 2720 
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BG-AATGCTRBTTGGAGTAATAAATCTCTGGAACAGATTTGGAATAACATGACCTGGATGGAGTGGGACA 


. ; 1 J ; t J t t I I I I I I t I I I II I * II I III I I I I I I I I 

GI3CCAAATGCAAGT-CTAACACCAGACTGGAACA-AT-GA-TA-CTTGGCRRGRGTGGGRGC 

2730 2740 2750 2760 2770 

1920 1930 1940 1350 1360 1370 1380 

GRGRRRTTRRC-RRTTRCRCRRGCTTRRTRCRTTCCTTRRTTGRRGRRTCGCRRRRCCRGCRRGRRRRGRRT 

, , . . . .. it i i ii ... . . . . ii*t* . * * i i t i » ..tit 

, , , t , i ( it t i it til ti i i i i t i t t i i i i t i i i i i t t i i i t 

gaaaggttgacttcttggabgaaaatataaca-gccctcctagaagaggcacaaattcaacaagagaagaac 
2780 2730 2800 281O 2820 2830 2840 2850 


1990 2000 2010 2020 2030 2040 2050 

GAACAAGAATTATTGGAATTAGATAAATG6GCAAGTTTGTGG—AATTGGTTTAACATAACAAATTGGCTGTG 

, iiiii. i lit lit till i it i ill i t i I I i i l i ii i t l l l i l 

i t i t i * i l III 111 lilt l it I ill i I i I i i I i i ii l i i i i i l 

ATGIATGAATTACAAAAGTTAAATAGCTGGG-ATGTGTTTGGCAATTGGTTTGACCTTGCTTCTTGGATAAA 
2360 2870 2830 2890 2900 2910 2320 

2060 3070 2080 2030 2100 2110 2120 

GT.-M 'ATP A A A AT ATT C A—TAATGAT AGT AGGAGGCTTGGT AGGTTT AAGAAT AGTTTTTGCTGT ACTTTCT 

: i : : i i i ; : ; i i i i t t i I t i i I lit l i i i i i i i .. t i It i 

i t i i t i i i t i i i i i i t i lilt i ill i ii i i i t i i i ■ i i ■ i t ii i 

GTPfl AT—ACAATATG]QAATTTA“I G -T AGTTGT AGGAGT RAT ACTGTT AAGAAT AGTGATCT AT AT AGT RCA A 
29>0 7340 2950 2360 2370 2980 2990 

2130 21-!0 2150 2160 2170 2180 

AT AGTGA ATAGi AG' f T- AGGCAGGGAT AYTCACCA-TTATCGTTTCAGACCCAC-CTCCCA-AC 

ii i t t ; i i t i i i i : : i i i i i lit it i i t i t tilt i i tit it 

t i i it tilt tit. tit. ttr til ii ii l i > till t i til it 

ATGC f AGCTA - Ai.-TTAAGGCAQGGGTATAGGCCAGTGTTCTC-TTCCCCACCCTCTTATTTCCAGTAGACTC 
3C00 3010 3020 3030 3040 3050 3060 

2190 2200 2210 2220 2230 2240 

-CCCGAGGGGACCCGACA -GGCC—C-GAAGGAATAGAAGAAGAAGGTGGAGAGAGAGACAGAGACA 

iii i t r i i i i i ii tti i iiiii i t i i i i lit i i i t i i i i i i i iii 

tti i i i i i i i t ti tii i iiiii i iiiii itt i i t t i i i i i i t iii 

ATACCCAACAGG.ACCC&ECACTSCCAACCAGAGAAGGCAAAGAAGGAGACGGTGGAGAAGGCGGTGGCAACA 
3070 3080 3090 3100 3110 3120 3130 


2250 2260 2270 2280 2230 2300 

GATC-CATTCB47TAGTBAACGGATCCTTAGCACTTATCTGGGACGATCTGCGGAGC—CTTG-TGC 

itt till i iii i i t t t i i i i i t iiiii ii i i i i it 

ill t t t t i t : i i i i i t it lit t t i i i i ii i i i i ii 

GCTC.CTGGCCTTGGCAGATAGAATATATTCATTTC—CTGATCCG CCA ACTGAT ACGCCTCTTGACTTGG 

3140 3150 3.160 3170 3180 3130 3200 


2310 2320 2330 2340 2350 2360 2370 

CTCTTCAGCTACCACCGCTTGAbiP.GACTTACTCTTGATTGTA-ACGAGGATTGTGGAACTTCTGGGACGCAG 

t J t I 1 ! 1 I . II III I t t I I t t I t I I I I 1 I I 1 III I I II 

CTATTCAGCAA-CTGCAGAACCTTGCTATCGAGAGCATACCA-GATCCTCCAACCAATACTCCAGAG 

3210 3220 3230 3240 3250 3260 


2380 2330 2400 241O 2420 2430 2440 

SGG5TGGGAAGCCCT~CAAA—TATTGSTGEAATCTCCTACAGTATTGGAGTCA—GGAACTAAAG 

i i i : : i i i i i iiiii t i t : i i i i i t t i t i iiiii 

GCTCTCTGCGACCCTACGAAGGATTC-GAE—A AGTCCT—C AGGACTG A ACTGACCT ACCT ACA A 
3270 3280 3290 3300 3310 3320 X 
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FT 
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45-33 
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FT 

SITE 

4733 
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FT 
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FT 

SITE 
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FT 
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3btS3 

X gene product (AA 1-112) 

FT 
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5637 


5632 
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FT 

SITE 

5753 
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ORF, exon 1 (tat) 

FT 

CDS 
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tat gene product (AA 1—99) 

FT 





(6083 is 2nd base in codon) 

FT 

S ). 1 L 
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* 
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FT 

CDS 
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ORF (env) 

FT 

I VS 

oOS4 
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S735 
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SITE 
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FT 
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FT 
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3’ LTR 

FT 
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3382 
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enhancer-1ike sequence 

FT 

SITE 

9332 


5402 

conserved sequence 

FT 

SITE 

3404 


34 1 2 

conserved sequence 

FT 

SITE_ 

‘r-nJ. i. 4 


*T; '\ p n. 

conserved sequence 
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Initial Geore 
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Gaps ~ 

280 

5 wi/o 

273 

Optimized Score = 1240 

Matches = 1422 

Conservat ive Subst i tut 1 ons 

S igni fi cance - 
Mismatches = 

0. oo 

890 

0 

X 10 

20 

30 40 

50 60 



ATCAGAG'rGAAL-iGAGAAATA rCASCACTTGTGGAGATGOGGGTGGA-RATGGGGCACCftTGCTCCTTGGG 


i i lit 


■ it i 


CTTRTCSCrATCTTG-CTTCTAAGTGTCTATeG-GAT-TTATTGTATTCAATATGTCA-CAGTCT-TTTATG 
X 61.20 6 1 .'50 6140 6150 6160 61 TO 61 SO 


70 SO 90 100 110 120 130 


ATATTGATGATC fGi TAG-TGCTACAGAAAAATTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAG 


i l II III 


II I I l l 


GTGTACCAGCT-P3GAGEAArGCGftCftATTCCCCTCTTCTGTGCAACCAAGmATAGGGATACT-TGGGGAAC 
G1SO 6200 621O 6220 6230 6240 6250 


140 150 160 170 180 190 200 

GAAGCA—ROC-RCCRCTCTRTTTTG-TGCRTCRGRTGCTRRRGCRT-RTGRTRCRGRGGTRCRTRRTGTTT 

! I ! I I I I 1 I I I I I I III!! 1 * t * * * » » i t i » t « • > * * 


A AC-TC AGTGCCTaccagat artsatgattattcaga—attggcccttrrtgtt rcrgrrrgctttgrtgctt 
6260 6270 62SO 6260 6300 6310 6320 


2.10 220 230 240 250 

□S-GCGAC-RCRTGC—CTGTGT AGCCACAG RCCC CRRCCCRCRRGRRGTRGTRTTG 

|| I'll tltit i l i l i l III i I I I l I I I II III 

GGGAGAAT ACAGTCACAGAACASGCAAT AGAGGACGT RTGGCRRCTCTTTGRGRCCTCRRT RR—RGCCTTG 
6330 6340 6350 6360 6370 6380 6390 


260 270 230 230 300 310 320 

•-GTAAATG rGACACAAAAT7TT RACA-TGTG-GAAAAATGACATGGT AGAACAGATGCATGAGGAT AT RRTC 


i i t 


i i t 


TGTAAAATT ATCCCCATTATGCATT ACT ATGRGRTGCRRT AAAAGTGAG—ACAGATRRRTGGGGRT—TGACA 
6400 6410 6420 6430 6440 6450 6460 


330 340 350 360 370 380 390 

AGTTTAT—GGGATCAA—RGCCTAAAGCCATGTGTAAAATTAACCCCACTCTGTGTTAGTTTAAAGTGCACTG 


i : t 


AAATCATCAACAACAACAGCATCAA—CAACAACAACAACAACAGCAAAATCAG-TAGAGACAAGAG—AC— 
S47G 6480 6490 6500 6510 6520 


400 410 420 430 440 450 460 

ATTTGGGGAATGCTACT AAT ACCAAT ACT AG—TAATACCAATAGTAGTAGCGGGGAA—ATGA—TGATGGAGA 


t i ii 


AT-AGTCAATGAGACTAGT-CCTTGTGTAGTTCATGATAATTGCACAGGCTTGGAACAAGAGCCAATGATA 

6530 6540 65f50 6560 6570 6580 6590 


470 480 430 500 510 520 530 

AASGAGAGAT AAAAAAGTGCTCTTTCAATATCAGCACAAGNATAAGAGGTAAGGTGCAGAAAG—AATATGC 


AGCT GTARATTCAACATGACAGGGTTAAAAAGAG—ACAAGAAAAAGGAGTACAATGAAACTTGGTACTCTGC 
6600 66.10 6620 6630 6640 6650 6660 


540 550 560 570 580 590 600 

—ATTT-TTTTATAAACTTGATATAATACCAATAGATAATGATACTACCAGCTATACGTTGACAAGTTGTAA 

i t i i : t t i rii i lilt ii i i i i i i t i i it i t i i iii i i i i i 

AGA' TTTGGfTTG TCAAG—AAGGGAATAGCACT-GGTAATGAAAGTAGATGTTACA-TGAATCACTGTAA 

6670 6680 6680 6700 6710 6720 6730 


SIO 320 630 640 650 660 670 

CACCTGAGTCATT ACACRGGCCTGT—CGAAAGGTATCCTTTGAGCGAATTCCCATACA—TTATTGTGCCCCG 


I I 






TACTTCTGTTATCCAnGAGTriTTGTGACAARGAT-TATTGGGATGCTATT-AGATGTAGATATTGTGCACCT 

£,740 6750 S7SO 6770 6780 6730 6800 


600 690 700 710 720 730 740 

GCTBGTTTTGCGATTCTAAAATGTAATAATA-AGACGTTCAATGGAACAGGACCA—TGTACAAATGT-CAG 


II II 


CGAGGTTATGCTTTGCTT AGATGTRATGACACAAATTATTCAG6CTTTATGCCTAACTGTTCTAAGGTAGTG 

601O 6820 6830 6840 6850 6860 6870 


750 760 770 780 790 800 810 

CACAGT ACA AT GT ACACRTGGA-AT-TAGGC-CAGTAGTATCAACTCAACTGCTGTTGAATGGCAGTCTAG 


GTC1 CTTC-ATG—CACAAGBATGATGGAGACACAG—ACT—TCT ACTTGGTTTCGGTTT AATGGAACT AGAG 

SSSO 6890 6900 6910 6920 6930 6940 


320 830 840 850 860 870 880 

CAGAAGA-AGAGGTAeTAATi'AGATCTGCCAATTTCACAGACAATGCTAAAACCATAA-TAGTACAGCTGAA 

(•*11 | til It III III I II I III II II II till III! I I II 

, I , t | | III If III III I II I III II II * I I I t I till I t It 

C AG A A A AT AGAACCT AT ATT T A-CTG6C-ATGGTAGAGAT AA-T AGGACT AT A ATT AGTCT A A AT—A A 

6950 6960 6S70 6980 6990 7000 


8S0 300 910 920 930 940 950 

CCAAT-CTGTAGAAATTAATT6TACAAGACCCAACAACAATACAAGAAAAAGTATCCGTATCCAGAGGGG 

it : I <1 lit t i tilt till III t ill lit I i t t • II til I 

t i i I it til t i i i : I till lit t lit tli l t l l * • • ill i 

GCA1TATAATCTAACAATGAAmTBTAGAAGA—CCAGGA—AAT—AAG—ACAGTTT-TA-CCA-GTC 

7010 7020 7030 7040 7050 7060 


960 S70 980 990 1000 1010 

ACCA—GGGAGAGCATT—TGTTACAATAGGAAAAATAGGAAAT ATGAGAC—AAGCA—CATT6T—AACAT 

tiii t i t i i i t i • i t t lit it iti i t i i i i t t i t i i t i i i 

i t t t i t i i t i i t i i t t tit ii tit i i t i i i t t t t t t t i i i 

ACC ATT ATGT C"l GC ATTGGT TTTCCACT—CACAACCAGTCAATGAGAGGCCAAAGCAGGCATGGTGTAGGT 
7070 7030 7090 7100 7110 7120 7130 


1020 1030 1040 1050 1060 1070 

T AGT AGAGCAAAAT GCAATGCCACTTT AAAACAGAT—AGCTAGCAAATT-AAGAGAACA-AT-TTG 

i : t i : it: it it t i t i i i l i i t i t i i i i i i i l t l i l i i i 

i t iti iti i i ii t t t i t i i it i lit ii tit ii i ii it ii 

T—TGGAGGAAATTGGAAGGAGGCAATAAAAGAGGTGAAGC-AGACCATTGTCAAACATCCCAGGTATACTG 
7140 7150 7160 7170 7180 7190 7200 


1030 1090 1100 1110 1120 1130 1140 

GAAAT AATAAAAt >- AAT—RATC—TTTAAGCAATCCT—CAGGAGGGGACCCAGAAATTGTAACGCACA—GT 

it: iii : t ii ii i i i i iti t iiti t i i i i i it it til ii ii ii ii 

tit iii tt ii it iiti iti i lilt t ... ii iii ii it it ii 

GAAGT AACAAT ACTGAT AAAA7CAATTTGACGGCTCCT AGAGGAGGAGATCCGGAA-GTTACCTTCATGT 

7210 7220 7230 7240 7250 7260 7270 

1150 1160 1170 1180 1190 1200 

— nTAATTGTGGAGGBBAA ITTTTCTACTGTAATTCAACACAACTGTTTAATAGTACTTGGTTTA—ATAG 

ttitj i i t it itt t i i i i t i i i i ii ■■ ii it ii i i iii i i i i i 

i i i i i i i t i iii i i i i i i t t i i it ii ti it it • i lit i iiii 

GGACAAATTGCAGAGGAGAGTTTCTCTACTGTAA-AA—TGAATTGGTTTCTA—AATTGGGTAGAAGATAG 

7280 72S0 7300 7310 7320 7330 

1210 1220 1230 1240 1250 1260 1270 1280 

TACT rGGAGTAC fGAAGGGlCAAATAACACTGAAGGAAGTGACACAATCACACTCCCATGCAGAATAAAACA 

>i till i t i i i i i » i > t ii i iitti i i i i i i . 

ti i : t j i t i i i i t t t i i it i i i i i i i i t t t i i i i i i i 

GAGTCT AACTACCCAGAAGGCAAAGGA—ACGGCATAAAAGG-AATTAC—GTACCATGTCATATTAGACA 

7340 7 oO 7360 7370 7380 7390 7400 

1290 1300 1310 1320 1330 1340 1350 

ATTTATAAACA7 GTGGCAGGAAGTAGGAAAABCAATG-TATGCCCCTCCCATCAGCGGACAAATTAGATGT 

t i t t i t i i t i i t i i t t t i t iii till iti t i i i i i iii i i i iii 

t i t i i : ] i t i t r i : i t t i i i III i i t t III i i i I i i i t i I I t i i i 

ARTRATCARCACTTGGCATAAAGTAGGCAAA—AATGTTTATTTGCCTCCAAGAGAGGGAGACCTCACGTGT 

7410 7420 7430 7440 7450 7460 7470 

13GO 1370 1380 13S0 1400 1410 1420 

TCATCAAArATTACAGGGCTGCTRTTAACAAGAGATGGTGGTAATAACAACAATGGGTCCGAG-ATCTTCAG 

tit iii ■ i ■ ■ i i i i i i 

iii tii i i i i i i i i i i 




AACTCCACnGTGACCAGTCTCATAGCAAACATAAATTGGACTGATGGAAACCA-AACTAGTATC- 

7400 7430 7500 7510 7520 7530 

1430 1440 1450 14GO 1470 1430 1490 

ACCTGGAGGftGGAGATATGAGesGAUAATTGGA—GAAGTGAATTATATAAAT ATAAAGT AGT AAAAATTGAA 


ACCATGA-GTCCAG—AGGTGGCARAACTGTATCSATTGGAATTGGGAGATTATAAATTAGTAGAAATCACT 
7540 7550 7530 7570 7580 7590 7600 


1500 1510 1520 

CCATTAGGAGTAGCACCCACCAAGGCAAAGAG- 


1530 1540 1550 

-AAGAGTGGT-6CA-GAGAGAAAAAAGAGCAGTGGG 


CCAATTGGUTT5GCCC;CCACAAATGT5AAGAGGTACACTACTGGTGGCACCTCAAGAAATAAAAG-A6-GGG 
7610 7620 7630 7640 7650 7660 7670 


1560 15' 

AATAGGAGCTTTn 


1580 


1590 


1600 


1610 


1620 


iTTCCTTGGGTTCT-TGGGAGCAGCAGGAAGCACTATGGGCGCACGGTCAATGACGCTGA 


III III 


TCTTTGTGC r OGGGT T GTTGr^GTTTTCTCGCAACGRCAGGTTCTGCAATGGGCGCGGCGTCGTTGACCGTGA 
7680 7930 7700 7710 7720 7730 7740 


1630 1640 1650 1660 1670 1680 1690 

CGGiT ACAGGiCC AG AC AATT ATTGTCTGGT AT AGTGC AGCAGCAG AAC A ATTTGCTGAGGGCT ATTGAGGCGG 

■ . ill <> i i • • i i i it.. i • i i i i iii • t t • •• • ■ <i • 

I ) III II t t t » I I 1 lilt I t I t T I I I 1 t I 111 I I It It I I It t 

CCGCTCAGTCCCGCACTTTATTGGCTGGGATAGTGCAGCAACAGCAACAGCTGTTGGACGTGGTCAAGAGAC 
7750 7760 7770 7780 7790 7800 7810 7820 

1700 1710 1720 1730 1740 1750 1760 1770 

AACA6CATCT6TTGCAACTCACABTCT6GEGCATCAAGCAGCTCCAGGCAAGAATCCTGGCTGTGGAAAGAT 

t l • i l I t I i I i III II l l i l i i t i i III i i l i l l I l ii II II l II 1 l 

AACAAGAATTGTTGCGACTGACCSTCTGGGGAACAAAGAACCTCCAGACTAGGGTCTCTGCCATCGAGAAGT 
7830 7340 7850 7860 7870 7880 7890 

1780 1790 1800 1810 1820 1830 1840 

ACCTAAAGGATCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTGCTGTGCCTT 

ii t t i i i i i it it ii » i i i t i it i i ii i it till iiti it t 

ti i i t i i i t ii it it i t i i t i it i i it i it i t i i titi ti i 

acttaaaggaccaegcgcagctaaatgcttggggatgtgcgtttagacaagtctgtcacactactgtaccat 
7900 7910 7920 7930 7940 7950 7960 

1850 1860 1870 1830 1390 1900 1910 

GG-aatgctagttggagtaataaatctctggaacagatttggaataacatgacctggatggagtgggaca 


GGCCAAATGCAAGT-CT.4A— 

7970 '7980 


-CACCAGA-TTGGAACAATGAGACTTGGCAAGAGTGGGAGC 
7990 8000 801O 8020 


1920 1330 1940 1950 1960 1970 1980 

l-.'AEAA ATT AACAATT ACACAAGCTT AATACA—TTCCTTAATTGAAGAATCGCAAAACCAGCAAGAAAAGAA 

i i it it it it iiii i ii i t i i t i ■ i i i i i ii i i i i i i i i ■ i 

t ii ii it it till i ii i i i i i ■ i t till ii i i • i i 1 i i t i 

6GAAGGTTGAC—TTCTTGGAGGCAA.AT AT AACGGCCCTCCT AGAAGAGGCACAAATTCAACAAGAGAAGAA 
8030 8040 8050 8060 8070 8080 8090 

1990 2000 2010 2020 2030 2040 2050 

TGAACAAGAATTATTGGAATTAGATAAATGGGCAAGTTTGTGG-AATTGGTTTAACATAACAAATTGGCTGT 
I * < i! t i t * ! ! ! * iiii i ii ! ii! i ■ i i i < i * i tt i ii i i i i i 

C-ATETATGAA TTACAAAAGT TGAATAGCTGGG-ATGTGTTTGGCAATTGGTTTGACCTTACTTCTTGGATAA 
8100 3110 8120 8130 8140 8150 8160 

2060 2070 2080 2090 2100 2110 2120 

GGTATATA-AAAAT-ATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTGCTGTACTT 

| I | t | I I t I 1 I III || It II I I I I I I I I II * I I I I I I I I I I I t II I 

I | ! I t I I It II t t t t I 11 It I I I I I I I I * I I I I I I 1 I f I f I I I It t 

AGTATATACAATATGEAATTTAT-ATAATTGTAGGAG TARTACTGTTAAGAATAGTGATCTATATAGTA 

8170 8180 8190 8200 8210 8220 8230 


2130 2140 2150 2160 2170 

TGTAT AGTGAATAGACTTAGGCAEGGATATTCACCA-TTATCGTTTCAGACCCACCT- 









CAAATGCTAGinAGGTTAAGACnGSGSTATAGGCCAGTGTTCTC-TTCCCCACCCTCTTATTTCCAGTAGAC 
8240 0250 3260 3270 8280 8290 8300 


2180 -2]8t> 2'200 2210 2220 2230 2240 

CCAACCCClsAGGG&ACCCGAC-AGGCC—C-GAAGGAATAGAAGAAGAA6GTGGAGAGAGAGACAGAGA 


CGATAGCCAACAGGATCCGGCTCTGGGAACCAAAGAAGGCAAAAAAGGAGACGGTGGAGGCAGCGGTGGCAA 
8310 83.20 8330 8340 8350 8360 8370 


2250 2260 2270 2280 2230 2300 

CAGATC-CATTCGATTAGTGAACGGATCCTTAGCACTTATCTGGGACGATCTGCGGAGC—CTTG-T 

I!!!! ! ! I I I ! ! i i t » i t < t i i i t i » i i it iiii i 

CAGCTCCTBSCCTTEGCAGATAGAATATATTCATTTC-CTGATCCG-CCAACTGAT ACGCCTCTTGACTT 

8380 8390 8400 8410 8420 8430 8440 


2330 2320 2330 2340 2350 2360 2370 

GCCTCTTCAGC* ACCACCGCTTGABAG.ACTTACTCTTGATTGTA-ACGAGGATTGTGGAACTTCTGGGACGC 


GGCTATT CAl.'.Gr. A-CTGCAGAACCTTGCTATCGAGAGCATACCA-GATCCTCCAACCAATATTCCAG 

8450 8460 8470 8480 8490 8500 

2380 2360 2400 2410 2420 2430 X 

AGGGGGTGGGAAGGCCT—CRAATATTGGTGGAATCTCCTACAGTATTGGAGTCA—GGAACT AAAG 

t t i t i tilt i iii iii i i i i t i t i i i i i i i i i i i 

ill i t : t i i i til til i i i t i i i i t i i i i i i i t i 

AGeCTCTCTGCLACCCTACGGAGAATTCGAGAA-GTCCT-CAGGCTTGAACTGACCTACCTACAA 
8510 8580 8530 8540 8550 8560 8570 


7. KUN2- i 58-CL.52, SE& 

RESIVAXX Simian immunodeficiency virus STLV—III(AGM) provir 

ID RESIVAXX standard; DNA*» 9264 BP. 

XX 

AC Y00295; 

XX 

DT 29—SEP- -1987 ( annotat i cm > 

XX 

DE Simian immunodeficiency virus STLV—III<AGM) proviral genome 
XX 

KW env gene. envelope glycoprotein; gag gene; genome; 

KW long terminal repeat; overlapping genes; pol gene; protease; 

KW 0 gene> reverse transcriptase; sor gene; transfer RNA; 

KW transfer RNA-Lys; tinidenti.fied reading frame. 

XX 

OS S i in i an i miiiunode f i r, i sncy v i ru s 

OC Viridao; sc-RNA enveloped viruses; Retroviridae. 

XX 

RN i:.l. ] (bases 1-9264) 

RA Reitz M„ ; 

RT ? 

RL Submitted ( 17-JUL-19S7) on tape to the EMBL Data Library by« 

RL Rfeitz M= 1 LTGBj NI!b Bldg 37 Room 6C09. Bethesda md 20892. USA. 

XX 

RN C2.1 (bases 1-8264) 

RA French!ni G,, , Gal lo R. C- » Guo H. G. » Gurgo C. » Cal latti E. . 

RA Fargnoli K. » Hr 1 1 L„ » Wong—steal F. . Reitz M. S. ; 

RT "Sequence of simian immunodeficiency virus and its relationship to 

RT the human i.mmunodeficiency viruses"; 

RL Nature 328•538-543(1S87). 

XX 

CC ifsourcos l ibrary-EMBL-3; cell l ine*-infected K6W 
XX 

FH Key From To Description 

FH 

FT SITE i 176 

FT R r T_ l _305 


R—region of 5’—LTR 
5’-Iona terminal repeat 


FT 

SITE 

132 

157 

pot. polyA signal 

FT 

5 J TE 

177 

303 

US-region of LTR 

FT 

TRNA 

306 

323 

transfer RNA—Lys< 3) 

FT 

CDS 

537 

t-i W wf' 

protein pi7 (AA 1 — 115) 

FT 

CDS 

533 

8057 

gag gene product 

FT 

CDS 

084 

2056 

protein p24 (AA l -391) 

FT 

CDS 

17,14 

2265 

protease (AA 1 - 184) 

FT 

CDS 

1714 

4375 

pel gene precursor polypeptide 

FT 

RFT 

1775 

1756 

imp. direct repeat 1 

FT 

FPT 

1 7SB 

1313 

imp. direct repeat 1 

FT 

CDS 

2266 

4875 

reverse transcriptase (AA l - 870) 

FT 

SITE 

4552 

4563 

polypurine tract 

FT 

CDS 

4778 

5443 

sor gene product 

FT 

CDS 

5702 

7303 

large envelope glycoprotein gplZO (AA 

FT 




1 - 536) 

FT 

CDS 

3702 

S371 

env gene product 

FT 

CDS 

7310 

3371 

small envelope glycoprotein gp32 (AA 

FT 




1 - 354) 

FT 

CDS 

01 36 

8828 

3’-ORF (AA 1 - 211) 

FT 

51 TE 

8576 

8530 

poiypurme tract 

FT 

T 

8531 

3022 

U3—region of LTR 

FT 

RFT 

853 X 

3264 

3’-long terminal repeat (LTR) 

FT 

SI TE 

00 25 3 

3264 

R--region of 3’LTR 

FT 

FRM 

3053 

3083 

pot. T AT A—box 

XX 





S0 

Sequence 

3264 8P 

l 3121 A J 

1743 Ci 2309 Gi 2085 Ti O Other; 


Initial Score 
Res i due C dent, i ty = 
Gaps = 


294 Opt i rn i zed Score == 1240 Sign! f i cance == O. OO 

54% Matches = 1421 Mismatches = 897 

26G Conservative Substitutions = O 


X lO 20 30 40 50 60 

ATGAGAGTGlS'.ftGKAaAAAmrCftGCACTT—GTGGAGATGGG-GGTGGA-AATGGGGCACCATGCTC 


■ lit t iii t i it lit i i i i i ill tit i i i i i ii 

till i lit t i ii lit i i i i i ill lit i i i i i ii 


ATCAGCTGCTTiTCGCCATCT-TGCTTTTAAGTGTCTATGGGATCTATTGTACTCAATATGTCA-CAGTCT- 
X 5750 57S0 5770 5780 5790 5800 5810 


70 50 90 100 110 120 130 

CT1 GGGAT ATTGATGATCTG T AGi-TGCT ACAGAAAAATTGTGGGTCACAGTCT ATT ATGGGGT ACCTGTG 

I : til I * it ii t i I I 1 I I I t I it I I 1 1 1 till 

t 1 III I t t I t I I I I I I I II II II 1 I I I I I I I I 

TTTATeGTGTACCAGCT-TGGAGGAATeCGACAATTCCCCTCTTCTGTGCAACCAAGAATAGGGATACT-TG 
5320 5830 5840 5850 5860 5870 5880 


140 150 160 170 180 190 

TGGAAGGAAGCA—ACC—ACCACTCTA1TTTG—TGCATCAGATGCTAAAGCAT—ATGATACAGAGGTACATA 


i i t i i t i i t i i t » ii iii i i i i i i i < i i i i i i i i i i 

till i t : i i t t i i t t iii t i i i i i i i t i i i i i i i i t 


GGGAACAACTCAGTGCCT ACCAGATAATGGTGATTATTCAGA—ATTGGCCCTTAATGTTACAGAAAGCTTTG 
5890 5900 5310 5920 5930 5940 5950 


200 210 220 230 . 240 250 260 270 

ATGTTT6GGCCACACPTGCCTGTGT ACCCACAGACCCCAACCCACAAGAAGT AGT ATTGGT A—AATGTGACA 

i i i t t i t t i ii lit t i i i i i t i ii tilt i t t t t i i t t i i 

I t 1 1 1 I t 1 I It t 1 ! 1 t t I t t t I II till I t I I I I 1 1 1 I 1 

ATGCTTGGGAGA-AT-ACAGT-CACASAAC-AGGCAATAGAGGACGTATGGCAACTCTTTGAGA 

5960 5370 5980 5930 6000 6010 

280 230 300 310 320 330 

-GAAAATTTTAACATGTGGAAAA—AT6A-CAT GGTAGAAC-AGATGCATGAGGATATAATCAG—TT 

iii i t t i i i t t i ii iii ii ii t i i i i i t i i i i i i i 

ill t I i i t i i i i t l ill il it i i l i i i i i i i ilit 

CCTC^Rm^PiGCCTTeTGTPl^APlTTPiTCCCCATTATGCftTTACTPiTGRGRTGCPiPlTAARRGTGAGACAGATA 
6020 6030 G040 6050 6060 6070 6080 

340 350 360 370 380 330 

TAT~-CiGG6 f— CA- AAGCCTAAA-GCCATGTGTAAAATTAACCCC-AC-TCTGTGTTAGT-TTAAAGTGCA 

• - t i i i i i t i t i t • i it tii t t i i i t t i t i i i i t t i 

fi ATIVGGG A H G AC A. A A ATC AT CA AC A AC AAT AAC A AC AGCAGCACC A ACATCAGC ACC AGT ATC AGAAAA A A 
(-rru i_c 1 < >Q_ClilO_6120_6130_6140_6150_ 



400 410 420 430 440 450 460 

CTGATTTGGGGAATGCTACTAATACCAATACTAS-TAATACCAATAGTAGTAGCGGGGAA-ATGA-TGATGG 


: t t 


i i i 


till 


TAGACATGCiTCAATGAGACTAGTTCTTGTA-TAGCTCAGAATAATTGCACAGGCTTGGAACAAGAGCAAATG 
6160 6170 6 ISO 6190 6200 6210 6220 


470 480 490 500 510 520 530 

AGAAASGAGAGATAAAAAACTGCTCTTTCAATATCAGCACAAGNATAAGAGGTAAGGTGCAGAAAG—A AT A 


1 I 1 


AT AAGCTGT AAATTCACCATGAGAGGGTT AAAAAGAG—ACAAGACAAAGGAGT ACAATGAAACTTGGT ACTC 
6230 6240 6250 6260 6270 6280 6290 6300 

540 550 560 570 580 590 600 

TBC—ATTT-7TTTATAAACTTGATATAATACCAATAGATAATGATACTACCAGCTATACGTTGACAAGTTG 

I ! till til T Iff I till II ( I I I t I I I I I I I I I I I III II 

t I t I I i til t lit (III II I .I I I till I t ( I II 

TACT-1GATTTGGTTTGTGAAC—AAGGGAAT AGCACT—GAT AATG AA AGC AG ATGCT AC A-TGAATCACTG 

6310 6320 6330 6340 6350 6360 

610 620 630 640 650 660 670 

T AAGACCTCAGTCATTACAOAGGCCTGT—CCAAAGGTATCCTTTGAGCCAATTCCCATACATTATTGTGCCC 


i i t i t i 


t it i 


tti i t 


T AACACTTCTETfT ATCGAAGA6TGTTGTGACAAACAT--T ATTGGGAT ACT ATT AGATTT AGGT ATTGTGCAC 
6370 6380 63S0 6400 6410 6420 6430 


680 630 700 710 720 730 740 

CGGCTGGTTTTGC6ATTCT AAAATGT AATAAT A—AGACGTTCAATGGAACAGGAC—CATGTACAAATGT—C 

i i iiit tii t tti i t i ■ t i t i i ii i it ii i till i itii 

1 I i i f I I 1 < 1 ; I I : t I I I i I i I I I I II III l I I I I I I I 1 

CTC8AGGTTATGCTTTGCT1AGATGTAATGACACAAATTATTCAGGCTTTATGCCTAAATGTTCTAAGGTGG 
6440 6450 6460 6470 6480 6490 6500 

750 760 770 780 790 800 

AGCACAGTACAA IGTACACAT GGA—AT-TAGGC-CAGTAGTATCAACTCAACTGCTGTTGAATGGCAGTCT 

t i i lit till t t i ii ii l lit t t ii tit i ii i i i i t i i 

i i t t ;it till tit it ti i tti i i it ill i ii i i i i i i i 

TGbfl CTCTTO-A l"G—CACAAGGATGATGGAGACACAG—ACT—TCTACTTGGTTTGGCTTTAATGGAACTAG 


6510 

6520 

6530 

6540 

6550 

6560 

6570 

810 

820 

830 

340 

850 

860 

870 


AGCAGAAGA-AGAGGTAGTAATTAGATCTGCCAATTTCACAGACAATGCTAAAACCATAA-TAG—TACAGC 

i i ; t t i i t : : t t t t tti i t t i i i i t I i i ■ i it i i i i ill i i I 

i i ; ; i i i i ill i ii iii lit i ii i it if ii ii i i i i ill it i 

AGCAGAAAATAGAACTTATATTTA-CTGGC—ATGGTAGGGAT AA TAGGACTATAATTAGTTTAAATA 

f3580 6590 6600 6610 6620 6630 6640 

QSO S5U 900 910 920 930 940 

TGAACCAATCT3 T AGAAATTAATTGTAGAAGACCCAACAACAATACAAGAAAAAGTATCCGTATCCAGAGGG 

I | 1 • I ! Ill II till lilt lit I tit tit I 111 1 II III 1 

t ‘ J 1 t 1 IT! It I t I I I I I I III l III til I III I II III I 

AG : TATT ATA A7CT A ACAATG: A A A! GT AGAAG A-CCAGGA—AAT—AAG—ACAGTTT-TA-CCA-GT 

665( i 6660 • 6670 6680 6690 

950 960 970 930 990 1000 1010 

GACCA—GGGAGAGCATT—TGTTACAAT—AGGAAAAATAGGAAATATGAGACAAGCA—CATTGTAACATT 

till I lit* i I t I I < I I II lilt I I I t • I I I I I » II 

tilt I I’ll | | | t t I I I t I lilt I I I I I 1 I I I I I II 

CACCATTATIiTCTGGATTGGTTTTCCACTCACAACCACTCACTGATAGGCCA-AAGCAGGCATGGT-GTT 

6700 6710 S720 6730 6740 6750 6760 

1020 1050 1040 1050 1060 1070 

AG—•TAGAGCAAAATCCAATGCCACTTTAAAACAGAT-AGCTAGCAAATT-AAGAGAACA-AT-TTG 

i i t i t i i i t t i i i i i i i i t t i t i t it iii lit it ii ii 

1 1 I I 1 1 I I I I t t I I I t t I 1 I I ! I I II III III It II It 

GGTT TGGAGGAAAATGiGAAGGATGCAATAAAAGAGGTGAAACAGACCATTGTCAAACATCCCAGGTATACTG 
6770 6780 6790 6800 6810 6820 6830 


1030 iOSO llOO 1110 1120 1130 1140 

GAAAT AAT AAA'-''C—A AT—AATC—TTT AAGC AATCCT—CAGGAGGGGACCCAGAAATTGT AACGCACA—GT 


GAACTAACAATACTGATAAAA' 


t i t i i tilt t t i i t i ii li lit 

; t i t t : t i t .I ii ii lit 


G^^TTT^^Cl^i^CTCCTG^Ge^GGAGATCCGGPiA- 


i * ii it it 

it ii t 1 ti 


-GTTPiCCTTCPkTGT 


G: ;5Q_oGRO_R37Q_SSSO 


6840 


SS3Q 


6900 




1150 1160 1170 1180 1190 1200 

-TTTAAT rfSTCIG ACiGGGAA TTTTTCTACTGT AATTCAACACAAC.TGTTT A AT AGT ACTTGGTTT A-AT AG 


GEACAAAT T PCAGAGGAGAGTTCCTGTACTGTAA-AA—TGAATTGGTTTCTA—AATTGGGTAGAGGATAG 

6910 6320 6930 6940 6950 6960 6970 


1210 1.220 1250 1240 1250 1260 1270 1280 


TACT TGGAGTACTE'-Y-'iGGGTCAAATAACACTGAAGGAAGTGACACAATCACACTCCCATGCAGAATAAAACA 


t i i 


tit > 


i ii ii 


GGATGTAAUTACCCAGAGGCCAAAKGA- AC-GGCATAGAAGGAATTAC-GTGCCGTGTCATATTAGACA 

S9(:0 6330 7000 7010 7020 7030 7040 


12:30 1300 1310 1320 1330 1340 1350 

ATTTATAftflGATBTGC-iCAGGAAGTAGGAAAAGCAAT'G-TATGCCCCTCCCATCAGCGGAGAAATTAGATGT 


i t t i l 


t i l l l l it 


AATAATCAACAC: rTGKG ATAA AGTAGGCAAA—AATGTTTATTTGCCTCCAAGAGAGGGAGACCTCACGTGT 
7050 7060 7070 7080 7090 7100 7110 


1330 1370 1330 1390 1400 1410 1420 

TCA3 GAAATAT T ACAUiGGCTKCTATTAACAAGAGATGGTGGTAATAACAACAATGGGTCCGAG - ATCTTCAG 


t t l 


III l 


AACTGCACAGTGACCAGTGTGATAEGAAACATAGATTGGACTGATGGAAACCA-AACTAGTATC- 

7120 7130 7140 7150 7160 7170 


1430 1440 1460 1460 1470 1480 1490 

ACCTKiGAGGAEGAGAl ATGROEGACAATTGGA—GAAGTGA ATT AT AT AAAT AT A AAGT AGT A AA AATTGA A 


I I T 


It I I I t I 


ACGATGA-iiTGCOG-AGGTSGCAGAACTGT ATCGATT AGAGTTGGGAGATTAT A AATT AGT AGAGATC ACT 

7ISO 7130 7200 7210 7220 7230 7240 


1500 1510 1520 

CC‘Ti TAGGAGTROCACCCACCAAGl'iCAAAGAG- 


1530 1540 1550 

-AAGAGTGGT-GCA—GAGAGAAAAAAGAGCAGTGGG 


CCGtfY fTSSCTTGniCCCCCACAGATGTGA AGAGGT ACACTACTGGTGGCACCTCAAGAAATAAAAG—AG—GGG 
7250 7260 7270 7280 7290 7300 7310 


1560 1570 1580 1530 1600 1610 1620 

AATAGEAB5TTTGTTCCTTGGGTTCT-TGGGAGCAGCAGGAAGCACTATGG6CGCACGGTCAATGACGCTGA 


TCTTTGTGCTAGGGTTCTTGlvGTTTTCTCGCAACGGCAGGTTCTGCAATGGGCGCGGCGTCGTTCAGGCTGA 
7320 7330 7340 7350 7360 7370 7380 


1630 1640 1550 1660 1670 1680 1690 

CGGTACAGGCCAH ACR.ATT ATTGTGTGGTATAGTGCAGCAGCAGAACAATTTGCTGAGGGCTATTGAGGCGC 


ii ii 


CCGCTCAGTCCCGGACTTTATTGGCTGGGATAGTGCAGCAACAGCAACAGCTGTTGGGCGTGGTCAAGAGAC 
7390 7400 7410 7420 7430 7440 7450 


1700 1710 1720 1730 1740 1750 1760 1770 

AAC A! iCATCT 61 TGC AACTCACAGTGTGGGGC ATCA AGCAGCTCCAGGCA AGAATCCTGGCTGTGGA AAGAT 


i i i 


AACAAGAATTGT rGCGATTEACCGTCTGGEGAACAAAGAACCTCCAGACTAGGGTCACTGCCATCGAGAAGT 
7460 7470 7400 7490 7500 7510 7520 


1780 1790 1800 1810 1820 1830 1840 

AGGTAAAG^ATGAAi^AGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTGCTGTGCCTT 

II II III! ■ ! II 'll t t i i i » ii i * * • i iii i t i i i * i i it i 

ACT 1 R6AGH'ACC AGGCGCAB 'TG A ATGCTTGGGG AT6TGCGTTT AG AC AAGTCTGCC AC ACTACTGT ACCAT 
7530 734;) 7550 75SO 7570 7580 7590 7600 


18-50 1860 1870 1880 1890 1900 1910 

(56-.AATECTAGT tggagt aat aaatctctggaacagatttggaat aacatgacctggatggagtgggaca 


t I I • I 


I T t t I ! t 1 I ! t I I I I tilt 

GGCCAAATGCAAGT-CTAACACCAGACTGGAACA-AT-GA-TA- 

7610 7620 7630 7640 


I lit 


I I 1 I l t t I 


-CTTSGGP>FtGAGTGG&AGiC 

7650 









1920 1930 1940 1950 1960 1970 1980 

GAC-AAATTRRC-RATTRCAC'-'.RGCTTAATACATTCCTTAATTGAAGAATCGCAAAACCAGCAAGAAARGAAT 


i : i ii i 


i t i 


i iiii 


lit i i i i i 


GRAAK3TT6!RCTTGTTGGA6GRARATATRRCn~6GCCTCCTRGARGiAGGCACAARTTCAACAAGAGAAGAAC 
7660 7S70 7690 7630 7700 7710 7720 


3 390 2000 3.010 2020 2030 2040 2050 

GAACAAGAATTATTGGflRTTAGATAART6GGCAAGTTTGTGG--AATTGGTTTAACATAACAAATTGGCTGTG 

i i I ; t i i t t f iii :iti i ti i iii i i i i i r i i i it i i i i i i i 

RTG'I RTGRRT1 RCRRRRGTTGRRTRGCl GGG-RTGTGTTTGGCRRTTGGTTTGRCCTTGCTTCTTGGRTRRR 
7730 7740 7750 7760 7770 7780 7790 7800 


2060 2070 2030 2090 2100 2110 2120 

GTRTRTRRRRRTRTTCR—TRRTGRTRGTRGGRGGCTTGGTRGGTTTRRGRRTRGTTTTTGCTGTRCTTTCT 

i i : i i i i i ( i i t i tit lit! t tit l il i i i i i i i t t i t i l it i 

ttiit* i t ? i : i i r tit tilt i ill i il i i t i i i i i i i i i i il i 

GTA17-YT-AGRO l ATGGAAT7 TATG--TAGTTGTAGGAGTAATACTGTTAAGAATAGTGATCTATATAGTACAA 
75-110 7820 7830 7840 7850 7860 7870 

2130 2140 2150 2160 2170 2180 

ATAGTGAAT AGAGTT--AGGCAGGGAT ATTCACCA TTATCGTTTCAGACCCAC-CTCCCA-AC— 

il i i i t : i t i t i i i i iii tit it i i it i i t i i i t ill il 

it i t * iitt i i t t t i * t it: itt it it i i i tiit i i iii ti 

ATGCT AGC F A—AGTTAAGGCAGGGGTATAEGCCAGTGTTCTC—TTCCCCACCCTCTTATTTCCAGTAGACTC 
7880 7890 7300 7310 7920 7930 7940 

2190 2200 2210 2220 2230 2240 

-CCCGAGGg&ACCCQACA- -G3CC—C-GAAGGAATAGAAGAAGAAGGTGGAGAGAGAGACAGAGACA 

(II I ! I I 1 I t It III I t I 1 I I I I 1 I I I III 1 t t I I 1 1 t 1 1 1 III 

til I I I I 1 I I I II 111 t I I I I I I I I I I I III I I I I I I I I I I I III 

ATACCCAACAGGACCCGGCACTGCCAACCAGAGAAGGCAAAGAAGGAGACGGTGGAGAAGGCGGTGGCAACA 
73.50 7960 7370 7980 7990 8000 8010 

2250 2260 2270 2280 2290 2300 

GATC-CATT’CGATT AGTGAACGGATCCTT AGCACTT ATCTGGGACGATCTGCGGAGC—CTTG-TGC 

III t 1 t | I III Till 1 I I I I I I I I I I I It lilt It 

111 tilt t tit 1111 I I I I I I I I t I I I II I 1 t I It 

GCTCCTGGCCTTGGCAGATAGAATATATTCATTTC-CTGATCCG-CCAACTGATACGCCTCTTGACTTGG 

8020 £030 3040 8050 8060 8070 -8080 

23.1.0 2320 2330 2340 2350 2360 2370 

CTC7TCAGCTACCACCCXTT6AGAGAC'! TACTCTTGATTGTA-ACGAGGATTGTGGAACTTCTGGGACGCAG 

: ; : i i i i : i i t i i i t i t i i i t l i i i i i i ill i i il 

t i : i i i t i t it t ; t i i i i i t i i i i i i i t i i t iii i i ii 

CTA‘1 TCAGtlAA-CTGCAGAACCTTGCTATCGAGAGCATACCA-GATCCTCCAACCAATACTCCAGAG 

8030 8100 8110 8120 8130 8140 


2380 2530 2400 2410 2420 2430 2440 

GGGGTGGGAAGCOCT- -CAAA-TATTGGTGGAATCTCCT ACAGTATTGGAGTCA-GGAACTAAAG 

t ! t r i ! ; ! t l 1 . 11 ] i t t l 1 i l l i l l l i l i i i i l 

I t I t 1 I I T I lilt* t I I t I I I I I I I I I I I I I t I 

GCTCTCTG^^^—AAGTCCT—CAGGACTGAACTGACCT ACCT AC A A 

SI50 Cl60 8170 8180 8130 8200 X 


8. KUNZ— 158—CL33„ SEO 

HIV2RGDX Huvnan immurode-l : ic i ency virus type 2 ROD iso 1 ate RN 

ID HIV2R0DX standards RNA? SS71 BP. 

XX 

AC XOS29 1: 

XX 

DT or-juM--■ 087 (annotation) 

XX 

DE Human i mmunodar i c 1 ency virus type 2 ROD isolate RNA genome 
DE (HIV-2) 

XX 

KW a;.,o;u:i red * immrre deficiency syndrome ? art gene; env gene? *f gene? 
KW gag gene" pel gene5 q genre r gene; tat gene. 

XX 

OS Human i »t *m :t ic^.-ricv virus type 2_ 



□S rdd ismett-.! 

□C V i r i dao > v«—Ris'A enve 1 oped v i ruses 5 Retrov i r i dae. 

XX 

RN r: i. i ( bases 1—3(671 ) 

RA ftllSO'il M, ; 

RT 5 

RL submitted <03- JUN- .1587) on tape to the EMBL Data Library by« 

RL r-larc a: iscvi'. unite cj’OTiCOiagle vlrale » end CNRS UA1157, instltut 

RL Pe.-iteuv- 5 uu rue du Dr Roux, 75724 Paris CEDEX 15, France, 

xx 

RN r2 1 <Us ses 1-5871) 

RA &ryadc-n M , Enervnsn M- , Sonigo P. , Clavel F. , Montagnier L. , 

RA A .1 :• --on i-:.. : 

RT *'Gtenomf) arcam^etlan end transact 1 vat ion of the human 
RT immunodef 1 ciency virus type 2"" 

RL Nature 32b ?ES2-SSS( 1387). 

XX 

RN E ;-i 1 

RA Clave). , ©jv'Pdar M, » Bustard D. , Salle M. , Montagnier L. > 

RA AJ Ison M, ; 

RT "Molecular cloning and polymorphism of the human immune deficiency 

RT virus type 2"1 

RL N« ture 324'631 -6S5< .1387). 

XX 


FH 

K‘-.:y 

From 

Tci 

Description 

FH 





FT 

S I ‘ fE 

■j 

173 

R region 

FT 

RFT 

i 

259 

LTR 

FT 

SITE 

1 

567 

HIV-2 RNA 

FT 




correspond!ng to integrated 

FT 




proviral DNA 

FT 

SITE 

1.74 

ooc? 

U5 region 

FT 

SITE 

30S 

320 

primer binding site 

FT 

CV.r. 

546 

211! 

gag protein 

FT 

CDS 

*s:?e 

4936 

pol protein 

FT 

SITE 

4513 

4626 

polypurine tract 2 

FT 

CDS 

, • r-,. r y 
■-'-OLV^u 

5513 

q protein 

FT 

cns 

5582 

6596 

r protein 

FT 

CDS 

5545 

6140 

tat protein part i 

FT 




(G.140 is 2nd base in codon) 

FT 

CCS 

G07 x 

6140 

art protein part 1 

FT 




(Sl40 is 1st base in codon) 

FT 

CDS 

G1 47 

8720 

9iiv protein 

FT 

CD’S 

o c? r\*~7 

i 

3400 

tat protein part 2 

FT 




< 8307 is 3rd base in codon) 

FT 

CDS 

3307* 

8535 

art protein part 2 

FT 




(8307 is 2nd base in codon) 

FT 

CDS 

5557 

9324 

f protein 

FT 

SITE 

S925 

8935 

polypurine tract 1 

FT 

SITE 

£642 

6467 

U3 region 

FT 

RPT 

8342 

93’ ? 1 

LTR 

FT 

RDM 

9329 

9339 

core enhancer sequence 

FT 

PRM 

940 l 

3416 

core enhancer sequence 

FT 

SITE 

9420 

3427 

pot. SP1 factor binding site 

FT 

SITE 

9423 

3437 

pot. SPl factor binding site 

FT 

S1 TE 

9433 

3443 

pot. SPl factor binding site 

FT 

PRM 

*3465 

3470 

TATA-box 

FT 

SITE 


3671 

R region 

FT 

SITE 

3345 

5654 

pot, polyA signal 

FT 

F-C! ..Y ft 

367 1 

HB7 1 

polyA site 

XX 





50 

SvtHvuerKje 

9071 3 

Pi 3314 P * 

1373 CS 2401 Gv 1383 T; O Other 


Ini tie.?. Scorn - SOB Optimized Score = 1185 Significance = 0.00 

Residue Identity 53/5 Matches = 1335 Mismatches = 8S8 

Saps_^_323 conservet 1 ve substitutions_*=_O 



X lo 20 30 40 50 60 

ATGiAC AG 1 Gl AAGGAGAAry fA T CAGC—ACTTOT GlaAQftTSGOOQTQGAAflTQCiGaSCACCATQCTCCTTQG 

i 1 : i i t ii t i i i i i if* t * ii i < ii»( ii i i i i 

ACCnGftCAAGTGABTATGAlGnATCAGLrraCTTATTOCC^TTTTATT—AGCTAGTGC-TTG-CTTAG 

X 6140 o150 G150 6170 6180 6160 

70 30 SO lOO llO 120 130 

biATATTG-ATG-ATCTGrAGTGCTACAGAAAAATTGTG-GGTCACAGTCTATTATGGGGTACCTG 

I : t t i t t t i I » i r i (til I i tit ti ti it it 

. t f t i i i i i i i t i r Itii l i lit ti it li li 

TATATTGCACCOnATATGTAnCTGT-TTTCTATGGCGTACCCACGTGGAAAAATGCAACCATTCCCCTCTTT 
6200 6210 8220 6230 6240 6250 6260 


140 i50 1 SO 170 180 160 

TGTGeAA0L3AA6CAACCACCACTCTATTTT6TG CAT-CAG-ATGCT AAAGCAT—ATGAT ACAGAGGT 

t i i i t i ittt i i i i i i » i t i i i i tit i i i t i i i i lit 

i i : t i i i t i : r t t t i i i t t i i t t t i i i t i i i i i i i i 

TGTi: .CAA-CCAGAAATAGGGA-TACTTGGGGA ACCAT ACAGTGCTTGCCTGACAATGATGATT ATCAGG— 

8270 8230 8230 6300 6310 6320 6330 

200 210 220 230 240 250 260 

ACA'I aATG fTTG —GG< XACACATGCCTGTG'I ACCCACAGACCCCAAC-CCACAAGAAGTAGTATTGGTAAA 

i : t i i i i i i i i i t : i i i i i ii ii ii i itii t i i i i i r 

t i i i i lit! : i i l t i i i i i i t i it »i i till i i i 1 i i i 

AAATAA-CTTTGmATGTAACAGAGRCTTTTG-ATGCATGGAATAATACAGTAACAGAACAAGCAATAGAAGA 
6340 6350 6360 6370 6380 6390 6400 


270 280 230 300 310 320 330 

TG—TGAG'AGAAAATT’i'TAACATG fGGAAAAATGACATG—GTAGAACAGATGCATGAGGATATAATCAGTTT 

ii i i i i it i i i till i i i t < i iii i ii i i i i i i 

ii i i i i ti iiii tilt i i i i i i iii i it t i i i i i 

TGTCTGGCATC7ATTCGAGACAT CA ATAAAACCATGTGTCAAACTAACACCTTTATGTGT—AGCAATGA 


10 

S420 

6430 

6440 

6450 

6460 

6470 

340 

350 

3(30 

370 

380 

390 

400 


ATGGGATCAAAGCC7AAAGC- —CATGTGT AAAATT AACCCCACTCTGTGTT AGTTT AAAGTGCACTGATTTG 

! lit: 1 till ; - ill » l l t li II il ll lilt 

i i I t I i i i 1 i it tit t t i i i i i t t t t i till 

AATC-:CAGCAGCAi::A3AGAGCAECACAGGGAA—caacacaac-ctcaaagagcacaa GCACAACCACA 

6480 64S0 6500 6510 6520 6530 


410 420 430 440 450 460 

GGGAATGCTACTAA1ACCAATA—CTAG-TAA-TACCAATAGTAGTAGCG-GGGAAATGA—TGATGGA 

i itt i i t i t till ttti i i t t iii tit i iii i 

1 III I t I I t lilt llll IIII III III I III I 

accacacccac--agaccaggagc:aagagp,taagtgaggatactccatgcgcacgcgcagacaactgct-ca 

65-10 8550 S5S0 6570 6580 6590 6600 


470 460 490 500 510 520 530 

gaaaggagagataaaaaactgctctttcaatatcagcacaagnataagagg—taaggtgcaga—aagaatat 

It I I I 1 . t ! I I ill II III III I I 1 I I I I I I I I I I I I I I I 

II 1 I I I I Ittt lit II til III I I I I I I I I I I 1 I I I I t I I 

GGAYTGGGAGAGGAAGAAACGATC AATTGCCAGTTCAA-TATGACAGGATTAGAAAGAGATAAGAAAAA 

661.0 6820 6630 6640 6650 6660 6670 

540 550 560 570 580 590 600 

GCA'i f TTT TTAT A: V'.CTTGA f ATAAT ACC A AT AG AT AATGAT ACT ACC AGCT AT ACGTTGACA AGTTGT AAC 

till t ( I l I I l I i t I I I t t II 1 l III IIII ll 

IIII I t I t I I I I t I I I I I I I II l I III III! II 

ACAGTATAAT—GAAACATGGTA-CT CAAAAGATGTGGTTTGTGAGACAAATAATAGCACAA-ATCAG 

FiGSO 8630 6700 6710 6720 6730 


BIO 620 630 640 650 660 670 

ACC'CAGTCAT'i nCACAGGCCTG'l CCAAAGGTATCCTTTGAGCCAATTCCCATACATTATTGTGCCCCGGCT 

111 IIII IIII! II 1 t II III II II II II I II I till I I 

lit I I t i : I I t i It I l ll 111 It ti ll » I I ll I llll I I 

ACC-CAGT—GTTACATGAACCATTGC—AACACATC-AGTCA TCACAGA—ATCA—TGTGACAAG—C 

6740 6750 8760 6770 6780 6790 


680 830 700 710 720 730 740 

GG’i T TTGCGATTCT AAA ATG fA ATAAT A—AGACGTTCAATGGAACAGG ACC—ATGT ACA AATGTCAGCA 

i : i t : t t i i t i i t ; i t i t ii it i l i i l l it itii ill 

itt it* it: it it t t i i ti i i i I i i i i ll iiii ill 

ACT ATTGGGATGCTATAAGG7TTAGATACT6TGCACCACCGGGTTATGCCCTATTAAGATGTAATGATACCA 


8300_r° 1 O_6020 


££££_§§22_§§§£ 


SB 30 




750 760 770 780 790 800 

CR5TA—CO-ATGTACACA—TGSAATT AGGCCAGT AGT ATCAAC—TCAAC TGCTG-TTGAATGGCA 

I t t II l t l l ; t ll til l l t t t I l l I l it I I l » l 1 l III 

111 It ( I 1 I I I II lit till! I I I I I II I I ■ I t I t III 

-O H ATTCAGGCTTTGCACCCAACTGTTCTAAAG-TAGTAGCTTCTACATGCACCAGGATGATGGAAACGCA 
GOTO £830 (3890 6900 6910 6920 6930 


810 820 330 840 850 860 870 

GTCTAGCRGRACAAeASGTAGTAATTAGATCTGCCAATTTCACAGACAATGCTAAAACCA-TAATAGTACA 


I : I I r III : t I ; it it * i i i i i i i i i i t i i i i 

AACTTCCAGA'TT,:-GTTT LSGCTTT ft—ft—TGGCACT AGAGCAGAGAAT AGAACAT AT ATCT ATTGGCATG 

6340 6350 6360 6370 6380 6330 7000 


800 830 900 910 920 930 940 

GCTfciAAGCAATCTGTAGAAATTAATTG'f ACAAGACCCAACAACAATACAAGAA—AAAGTATCCGTATCCAG 

lilt i j : i t i ( • ll i i I i till I i i i lit i t i i i i ll 

GCAGAGATAA-TAGAACT—ATCAT—CA—GCTT AA ACA—AATATTATAATCTCAGTTTGCATTGTAAG 

7010 7020 7030 7040 7050 7060 


350 960 970 380 930 lOOO 1010 

AGGGGACCAGGGA GAGCATTTGTTAC-AATAGGAAAAAT-AGGAAATATGAGACAAGCACATTGTAAC 


AGS-CCAGSGAATAAGRCABTGAAAGARATAATGCTTATGTCAGGACATGTGTTTCACTCCCACT-AC 

7070 7030 7030 7100 7110 7120 7130 

1020 1.030 1040 1050 1060 1070 

ATT AGTAGHGC AAAATGCAATG3C7ACTTTAAAACA-GATAG—CTA—QC AAATT AAGAGA—ACAAT—T 

II I t I I I J I 1 I I 1 1 t I I III II . lit! Ill 

II I I I I I I I I f 1 I lilt III t I I 1 I I I I till III 

—CAGCCGATCAATA-ARAGACCCAGACAAGCATGGTGCTGGTTCAAAGGCAAATGGAAAGACGCCATGC 

7140 7150 7160 7170 7180 7190 

a080 1030 HOC 1110 1120 1130 1140 

TGGAAAT—AATAAAACAATAATCTTTAAGCAATCCTCAGG-AGGGGACCCAGAAATTGTAACGCACAGTT 

: t f t ! : til i t t t iii.iiii i i i i t i t till ti it 

lit iii iiit t t it i t i i • t i i i i i i i i t till i i ti 

AGGAGGTGAAGGAAACCCT-TGCAAAACATCC-CAGGTATAGA6GAACC-AAT—GACACAAGGAATA 

7200 7210 7220 7230 7240 7250 7260 


1150 1160 1170 1180 1190 

TTAATTGTGGAG-GGGAATTTTTC—TAC-TGTA—ATTCA-ACAC—AACTGTTT AAT AGT AC 

tit i t i i t iiii ii ii iii i i • i i i t i • i • i tit 

i i : titti iiit i i ti iii titi i i i i i i i i iii 

TTARCTTTGCAGCGCCAGGAAAAGECTCAGACCCAGAAGTAGCATACATGTGGACTAACTGCAGAGGAG-AG 
7270 7280 7290 7300 7310 7320 7330 


1200 1210 1220 1230 1240 1250 1260 

TTGGTTTAATAGTACTTGGAGTACTGAAGGGTCAAAT-AACACTGAAGGAAGTGACA-CAATCACACTCC 

t i till ft i ii t ii i i i i i it iii ill iii i i i i i i i 

t i (lit t i I t t ii i i i i i i i ill tit iii i t i t i i i 

TTTf J TCTACTGCAACAT-GACT—TGGTTCCTCAATTGGATAGAGAA—T AAGACAC-ACCGCAATT ATGC-AC 
7340 7350 7360 7370 7380 7390 


1270 12S0 1290 1300 1310 1320 1330 

CATGCAGAATAAAACAATTTATAAACATGTGGCAGGAAGTAGGAAAAGCAATG—TATGCCCCTCCCATCAG 

i : i i it'll iti i j i tilt t i i t t i t v t i t it till lit i i i i i t i 

t t t i t t i i t tit i t ti^t t i i t i i i » t ( » t t iiii itt i i i » i t i 

CGTGCCATATAAAGCAAAT AATT AACACATGGCAT AAGGTAGG—GAGAAATGTATATTTGCCTCCCA- 

7400 7410 7420 7430 7440 7450 7460 


1340 1.350 13GO 1370 1380 1390 1400 

C65ACAAATTASATGTTCATCAAATATTACRG—GGCTGC—TATTAACAAGAGATGG—TGGTAATAACARC 


GGGA'-AGGGG AGt 1TGTCGTGCAACT—GARCAG T AACCAGCAT AATTGCTA—ACATTGACTGSCAA—AACAAT 
7470 7480 7490 7500 7510 7520 7530 


1410 1480 1.430 1440 1450 1460 

AATSGGT CCGREAT-CTTCAGACCTGGAGGAGG-AGA—TATGAGGGACA-ATTGGAGAAGTGAATTATA 


AATCAGACAA^OATTRCCTTTAGTBC-AGAGGTGGCAGAACTAT-ACAGATTGGAGTTGGGAGAT—TA 

7540 7550 7560 7570 7580 7590 






1470 1480 1430 1500 1510 1520 1530 

TAAATATAAATTAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGA-AGAGTGGTGCAGA 


TAAAT-T GGT AGAAAT AACACGAATTGGCTTGGCAGCT AGAAAAGAAAAAAGAT ACTCGTCTGCTCA 

7500 7610 7620 7630 7640 7650 7660 


1540 5550 1560 1570 1580 1590 1600 

GASAAAAAnCAGCAeTGG-GAATAGGAGCTTTGTTCCTTGGGTTCT-TGGGAGCAGCAGGAAGCACTATGGG 


CGfcOnGACATAOnAGAGSTerrGTTCGTGCTAGGGTTCTTGGGTTTTCTCGCAACAGCAGGTTCTGCAATGGG 
7670 7680 7690 7700 7710 7720 7730 


1610 1520 1630 1640 1650 1660 1670 

CGCACGGTCA;'il 6 ACuCTGACSGTACAGGCCAGACAATTATTGTCTGGTATAGTGCAGCAGCAGAACAATTT 

t t t iii iiii it tit tit it i tit ti i it .. i i i i t iti i i i 

iti iti itii it itt iii it ■ iii it ■ <■ t i t t i i t i ■ i i iii i < i 

CGCGGCGTCCCTGACCGTGTCGGCTCAGTCCCGGACTTTACTGGCCGGGATAGTGCAGCAACAGCAACAGCT 
7740 7750 7760 7770 7780 7790 7800 


1680 1SS0 1700 1710 1720 1730 1740 1750 

GCTGAGGGCTA'iTEAGGCGCAACAGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAG 

1 | ; 1 11 tilt! t I 1 l I t I t ill II i i i l i i t i t it I ■ I 1 I 1 I 1 I I I I 

; t I i it i t i : t r l i 1 I i i I ill ti i i i I i I l i t ti i I I I I I i I I i I i 

UTTSGACGfGGTCAAGAGACAACAAGAACTGTTGCGACTGACCGTCTGGGGAACGAAAAACCTCCAGGCAAG 
7610 7820 7030 7840 7850 7860 7870 

1760 1770 17S0 1790 1800 1810 1820 

AATGCTGGCTGTGGAAAGATACCTAAAGGATCAACAGCTCCTGGGGATT—TGGGGTTGCTCTGGAAAACTC 

I I ! Ill T It I t I I I I I lit) II II II III I I I I I It I I It 

it tit t i : t i t i t i i i i t i i ti it iii i i i i i ti I i it 

AGTCACTECTATASAGAAGTACCTACAGGACCAGGCGCGGCT—AAATTCATGGGGATGTGCGTTTAGACAA 
7880 7030 7900 7910 7920 7930 7940 

1830 1840 1850 1860 1870 1880 1890 

ATI 1 GCACCA17! GCTGTGCCT'TGGAATGCTAGTTGGAGTAATAAATCTCTGGAACAGATTTGGAATAACATG 

t tit i i • i i i i ; i i t t i i i i i i i iiii i i i i i i i i i 

lit: t : t t I I 1 1 I I 1 t II t 1 t I I I I IIII I I I I I I I I I 

GTC;GCCACA5-['ACTGTACCATGG-GT-TAATGATTCCTT AGCACCTGACTGGGACAAT ATG 

7950 7960 7970 7980 7990 8000 


1300 1910 1920 1930 1940 1950 1960 

ACCTGGAT!Ti4G4‘GGGACAGAGAAATT AACAATT ACACAAGCTT AAT A-CATTCCTT AATT—GAAGAATCGC 

t t t i i i i * ; i t i i i i : i i t i i ii i t l i i i t i i l l i i ii 

i i i i i i ’ i i t i i t ; i t t t i i ii i ■ i i t i i i i i i i i ii 

ACG'i QGC A! 35A ATGGGA AAA AC AAGT—CCGCT ACCT GGAGGCA AAT ATCAGT AA AAGTTT AG AACAGGC AC 
8010 8020 8030 8040 8050 8060 8070 


•570 1930 1:590 2000 2010 2020 2030 

AAhACCAGCAmGAAAAGAATGAACAAGAATTATTGGAATTAGATAAATGGGCAAGTTTGTGGAATTGGTTTA 

1 ! : l i I ! : itt i lit i i i i l I t III IIII l III l i i i i i i i t i 

t i . t i i t t : • it iti i tit ii i i > t i iii iiii i iii i i i i i i i i i i 

A AA"! TCAGCAAGAEAAAAAT f iTGTATGA ACT ACAAAAATT AAAT AGCTGGGAT ATTTTTGGCAATTGGTTTG 
8080 8090 8100 8110 8120 8130 8140 8150 

2040 2050 2060 2070 2080 2090 2100 

ACA IA AC AA ATT GGC‘1 GTGGTATA--T A AAAAT—ATTC AT AATGAT AGT AGGAGGCTTGGT AGGTTT AAGAA 

ti till iii : t i i I i > ti ti t i i ii i t i i i i i ii i ill ■ i t I t I I i 

It iiii r ; t i t t t l t f ■■ t i t i i it l i l t i i i ii i ill i i i i l i i i 

ACT (A AGO I'CGTITGCTCAAE' .'ATATTCA AT ATGGAGTGCTT AT AAT AGT AGCAG T AAT AGCTTT AAGAA 

3160 SI70 3ISO 8190 8200 8210 8220 


2110 2150 2130 2140 2150 2160 

TAGVTTTTGCI 5 t'ACTTT CTATAuf GiAA T AGAGTTAGGCAGGGAT'ATTCACC—ATTATCGTTTC- 

tilt i iiii i ti i i iii iiii iiii lit ii ii ii ill 

t i i i tilt < i i t l ill till iiii ill II ii II III 

TAGTGATAlATG f AGTACAAATGTl AAGTAGGCTTAGAAAGGGCTATAGGCCTGTTTTCTCTTCCCCCCCCG 
8250 8240 3250 8260 8270 S280 8290 


2170 'T30 2190 2200 2210 2220 

-AGACG'OAGG-IU3CAACCCCGAGGGGACC—CGACAG-GCC—CGA AGG A AT AG AAGAAG AAGGT 

| t 111 t ITI It I I I I I I I I I I I III I I I I < II I I I I I I I I I III 

t i ■ ii r i < t : t i i i i i i I tit i I i i i ii i i i i i i i i i ill 

ATfATATCCAACAEATCCATATCCACA-AGGACCGGGGACAGCCAGCCAACGAAGAAACAGAAGAAGACGGT 
_0300_I O_3320 _8330_ 8340_8350_8360 


8330 






9230 2240 2250 22G0 2270 2280 2290 

GBALnGAGAGRCnGPLACAGATCCATTCBATTAGTGAACGGATCCTTAGCACTTATCTGGGACGATCTGCGG 

, , , | t t I I t I I t I t T t I II III I I I 1 1 I III II till 

t t t i i r i . * t i i f < t i i ii lit i i i i i i ill ti lilt 

GtPAGCPACLeiGGPBACAGATRCTGGCCCTGGCCGATAGCATATATA-CA-TTTCCTGATCCG-CCAGCTG 
3370 8380 8390 8400 8410 8420 8430 


2300 


,2-> A v< 


2320 


2330 


2340 


2350 


2360 


AGCO rTGiTGCOTCTT-CAG—CTA—CCA—CG6CTTGAGAGACTTRCTCTTGRTTGTARC GAGGATTGTG 


l i t i i l ■ 


p-rT—CGCCTC TTGPCCAb-.ACT AT ACAGCA TCTGCAGGGACTT ACT ATCCA GGAGCTTCCTGACCCTC 

04-40 8450 Q4SO 3470 3480 8430 

2370 9330 2390 2400 2410 2420 2430 

BRACT—Tt.TFGGB ACGnAGBGCiGTGGGP AGCGCTG AAAT A—TTGGTGG AATC—TCCT AC AGT ATTGG AGTCA 

I I I J Ill II'ii i ii t i i i i i i > i t t i i i i t i i i i i 

CAPUTCATCT-.—AC—*• -AGAATCTCAGAGACTGGCTGAGACTTAGAACAGCCTTCTTGCAATATGGGTGCGA 

8500 83 3.0 8820 0530 8540 8550 8560 


—GL.PAC1 P-—RAG 

i i : i i ill 

8570 3F.80 


9. KUNZ- 156-CL38 360 

RES'10251 Simian Immunodef iciency virus (Mac25l> envelope ge 

ID RF.3IV251 standard? UNA? 1 142 BP. 

XX 

AC X06879; Vi '•0294 ; 

XX 

DT 2: ; - -j'UN-t 988 < ?cc?i added) 

DT 26- -MAY - X 888 ( anno tot1 on > 

XX 

DE Siffiian liUTiiunodn-fic:iancy virus (Mac251) envelope gene DNA (part. ) 

DE ii it eg voted cop/ 

XX 

KW o iv gene; otivr 1 ope gone, 

XX 

OS Simian l mmunodef iciency virus 

OC V' rxc'cvd no-RMA enveloped viruses? Retroviridae. 

XX 

RN m (bases 1-H42) 

RA Kc-.st Her K, W, 5 

RT ; 

RL Submitted < 25-T-EB-J.98S) to the EMSL Data Library by* 

RL nest lev i-L V/, * Harvard Medical School? New England Regional Primate 

RL Research Center * Department of Microbiology* One Pine Hill Drive* 
RL SouthbOT augh * Mass, ‘ 0J.772* USA* 

XX 

RN c: :i 

RA Kevtlor H. W, ■ Li Y. * Naidu Y, M. , Butler C. V. > Ochs M. F. . 

RA Jaanel ei. * X x ng N. W„ , Daniel M. D. * Desrosiers R. C. ; 

RT *'Comparison of Simian irrimunodeficiency virus isolates"? 

RL Nature 33?. "8i3--S22*< 1980). 

XX 

CC '••iiourca* ?trai. ;i=Maceca mulatta 251 (host)* clone== lambda SIV 251? 

XX 


FH 

; .w-y 

r :TiVu 

T 3 

Dascript1on 

FH 





FT 

SITE 

< X 

1 1 4S 

put,: env gene 

FT 




(1 is 2nd base in codon) 

FT 

SI TE 

1 OSS 

loss 

in frame stop codon 


.XX 




so 


322 A5 208 C! 273 G? 231 Ti 108 Other? 


sequence ii42 BP" 


Initial Score 
Re s i I de n t, i r. y 

Gaps 


259 Optimized Score = 565 Significance = 0.00 

51% Matches - 622 Mismatches = 488 

83 conservative Substitutions = O 


1140 1150 1160 1170 1180 11SO 1200 

ACAGTTT'I 'A AT'I STor'AGGGGAAl TTTTCTACTGTAATTCAACAGAACTGTTTPATAGTACTTGGTTTA—A 


i i t i t t t t I i t ■ i i i i t i i t i » i i ii ii ii ii it t i til i l 

i;^Ci->tinGC?iPir,iFtl 3 T'l''CCTCTRCTGTRR-RR—TGRRTTG 6 TTTGTR“RRTTC 5 C 3 GiTR 6 RGGR 

i a 20 30 40 50 60 


A 


1210 1220 1230 1240 1250 1260 1270 

TRR 1 RCT 1 ObiRCiTROl uRRGGGTCRRRl RRCRCTGRRGGRRBTGRCRCRRTCRCRCTCCCRTGCRGRRTRRR 

lit t lilt I 1 : I I l 1 ill til II I t I I I I I I I I ill 

TAG0a3ATGTAACrfP.CCCAGAGGCCAAAGGA-AC-GGCAT AGAAGGAATT AC—GTGCCGTGTCAT ATT AG 

70 CO SO 100 110 120 130 

1280 1260 1300 1310 1320 1330 1340 

ACAATTT P. fAAACATGTGGCAGGAAGT AGGAAAAGCAATG—TATGCCCCTCCCATCAGCGGACAAATTAGA 

ii:i i it tit: i i i i i i i i t t i i lit i i i i tit i i i i i t iii i i i 

I t t I It till t 1 1 I I ! 1 t 1 I t I 111 till III I I I I I I lit I I I 

Al V-VAAT AATC AACAClTGECAT AA AGT AGGCAAA—AATGTTTATTTGCCTCCAAGAGAGGGAGACCTCACG 
140 150 ISO 170 180 130 200 

1350 *360 1370 1380 1390 1400 1410 

TI371 GATCAAATA7 TACAGGGCTGCTAT TAACAAGAGATG6TGGTAATAACAACAATGGGTCCGAGATCTTC 

it: t i : tif i:i it ii i I i i i t ill till t i i i i i 

til i p i t i tit ti ii t i i i i i iii till i till i 

TGTAACTCi .V-' 3 AGTGACCAGTCTCAT AGCAAACAT AGATTGGACTGATGGAAACCA-AACT A AT AT CACC 

210 220 230 240 250 260 270 

1420 1430 1440 1450 1460 1470 1480 

AGACCTS—P'-v :2APt.: ARATATRAGGGACAATTGGAGAAGTGAATT AT AT AAAT AT AAAGT AGT AAAAATT 

i ti ’itt it tt; i t i i t t t t i t i ii i i i i i t i t i i i i i i t i 

I It ’ f * T t I T t t ! 1 I I I I I t I 1 1 I 1 I 1 I I 1 I I I 1 I I I I It 

ATCAGTGCAg; 2;.,:TGGCAGAACT—GT ATCGA TTGGAGTTGGGAGAT-T AT AAAT-T AGT AGAGATN 

280 250 300 310 320 330 

1450 i5,0 1510 1520 1530 1540 1550 

r3AAt::GATT2.20-Ar.!7AGCACCCACCAAaGCAAftGAS-AAGAGTGGT—GCA—GAGAGAAAAAAGAGCAGT 

i i t t : i i I I i i i i I i till til i i i i i ti it II 

t till! t t i i i i i t 1 till III I 1 I I I II ii it 

NM3NNMMisliviv:,.!!'«!i\;;\:-.!i'.!CCCCr.ACAGATGrGAAGAGGTACACTACTGGTGGCACCTCAAGAAATAAAAG-AG- 
340 350 360 370 380 390 400 

1560 1570 5.580 1590 1600 1610 1620 

GGKviHTAGahV:: H ;, ‘TCi: fTGGG ffCT-TGGGAGCAGCAGGAAGCACTATGGGCGCACGGTCAATGACGC 


GGG.fl CTTTi 1 f ICTAGGGTTC- ff GC-.GTTTTCTCGCAACGGCAGGTTCTGCAATGGGCGCGGCGTCNNNNNNNN 
410 -5.20 430 440 450 460 470 

1030 1640 1650 1SS0 1670 1680 1690 

TGACtGGTAnrruSrsr.lACAATTATTGTCTGGTATAGTGCAGCAGCAGAACAATTTGCTGAGGGCTATTGAGG 


i lit 


i i i 


NMMCCGCTCAG<OCCFXiACOTTATTG6UTGGGATAGTGCAGCAACAGCAACAGCTGTTGGACGTGGTCAAGA 
480 4-it > .300 510 520 530 540 

1700 1710 1720 1730 1740 1750 1760 

CGCAACAGCATCT6TV6CAACTCACAGTCTGEGGCATCAAGCAGCTCCAGGCAAGAATCCTGGCTGTGGAAA 

T t I I I t t I t t t tit II I 1 I I I 1 I I I III I I I I I I I t II II II I II I 

: ' t t I : t * . t t tit t I 1 I 1 I t 1 1 I t lit t t I I I I I I it I I II I II I 

GACAACAAGAATIGT-GCGAt ITGACCGTCTGEGGAACAAAGAACCTCCAGACTAGGGTCACTGCCATCGAGA 
550 2.60 570 580 530 600 610 620 


1770 1730 1730 3.300 1810 1820 1830 1840 

GATACCTAMACGATCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTGCTGTGC 


ill i 


A : :rror;TTA;'V7,GG;-',.;5/GGCG:5AGCTGAATGCTrGGGGATGTGCGTTTAGACAAGTCTGCCACACTACTGTAC 
630 640 650 660 670 680 690 










1850 1 :jEO 1870 1SS0 1890 1900 

CTTGG-AATLCTAGTTGEAGTAATAAATCTC.TGGAACAGATTTGGAATAACATGACCTGGATGGAGTGGG 

i t i i t t i t t i t i iii i t i i i t i t i it it it i it* t t t i i t t 

l ill l t i 1 i ill III t l l l l i i i t it It It l III l l i i i i l 

CATGSCCAAATCCAACiT-CTAACACCASACTG6AACA-AT-GA-TA-CTTGGCAAGAGTGGG 

700 710 720 730 740 750 


1910 1 920 1330 1940 1950 1960 1970 1980 

ACAGAGAAAT TTY-' 1 ")— '-'ATTACACAA3CTTAATACATTCCTTAATTGAAGAATCGCAAAACCAGCAAGAAAAG 

\ • t : lilt tt • t it ill ill i I i I i i i t i i i I i i i I i i I i I 

AL-iCl-vAAAeGTTrACTTCTTGGAGGAAAATATAACA-GCCCTCCTAGAAGAGGCACAAATTCAACAAGAGAAG 
750 770 780 790 800 810 820 


1930 2000 2010 2020 2030 2040 2050 

AATCnACAAPAATTATTGGAATTAGATAAATGGGCAAGTTTGTGG-AATTGGTTTAACATAACAAATTGGCT 

II | | • 1 I I I ! 'I III till I It I 111 til 

AACATSTATGAAnACAAAA6TTGAATABCTGBG-ATGTGTTTGGCAATNNNNNNNNNNNNNNNNNNNNNNN 
830 840 850 860 870 880 890 

20S0 2070 2080 2090 2100 2110 2120 

G'niiET AT ATAAAAAT A' TTCAT AATGAT AGTAGGASGCTT6GT AGGTTT AAGAAT AGTTTTTGCTGT ACTTTC 

i lilt 

pinnhnnpmnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnngatctatatagtaca 


300 

sxo 

320 

930 

940 

950 

960 

21 30 

2140 

2150 

2160 

2170 

2180 

2190 


T ATAGTGArVT AGAGT1 -■ AGGCABGGATATTCACCATTATCGTTTCAGACCCACCTCCCAACCCCGAGGGG— 

it I * : * t I t t i I I t r I lit I t t t i I it i t I I I I I I I II I 

AA1 IdCTABPTr-V AGTTAAGGOAGC-iliGTATAGGCCAGTGT—TCTCTTCCCCACCCTCTTATTTCTAGCAGAC 
970 630 ‘350 1.000 1010 1020 1030 


2200 

-AGCCPAOA'F 


2210 2220 2230 2240 2250 

-AAGGAA1AGAAGAAGAAGGTGGA6AGAGAGACAGAGACAGATCCATTCGAT 


TC/V1 AQJCAj vJ. \<'2,'iid ICGuCY-VJTfjCCAACCAG—AbiAAGGCAAAGAPiGGAGACGGTG—GAGA-AGGCGGT 

1.040 1030 iOSO 1070 10S0 1090 1100 

2280 2270 22EX) 2.290 2300 

TAGTGAACGEATt ICTTAGCACTI ATCTG—GGACGATCTGCGGAGC 


-Gif: -CAACAC-.Cn c 3C-TGGC- UTTGGCAGATAGAATATATTCATTTC 
1 1 j.O 1120 1130 1140 X 


10. KUNZ~ X53 -CL 3 -1, SET! 

M15 i27 !-7.guY'© 1. Structure of the art gene of HTLV-III. C 

ID Ml5127 urrivrvnota.tedt' xxx 5 306 SP. 

XX 

AC Ml5127 
XX 

DT 3,0—JUL—1989 < incorporated) 

XX 

DE Figure 1. Structure of the art gene of HTLV-III« Coding Exon II. 

XX 

KW 

XX 

OS 

oc 

XX 

RN "13 (brses 1-306) 

RA Goh w. c. . Sodroski J. 15. , Rosen C. A. » Haseltme W. A. ; 

RT "Expression o-f the err Gene Protein of Human T-Lymphotropic Virus 

RT Type III (HTLV-III/LAV) in Bacteria "> 

RL J, V1 ro 1 •. 51 - £.'33 --637 f ? S87) 

XX 

FH Key From Tn Description 







FH 

XX 

so sequence 30 s WPS 35 n$ ss c; 87 q? 66 tj o other; 

o. oo 
2 
o 


Initial Score j 184 Optimized Score ■=■ 298 Significance — 

Residue Identity - 33% Matches = 239 Mismatches 

Gaps =* 1 Conservative Substitutions = 


2140 ->2iZ0 21.30 2170 2180 2130 2200 2210 

TAGGCAGGAATATTCACCAT TATGGiTTTCAGACCCACCTCCCAACCCCGAGGGGACCCGACAGGCCCGAAGG 

g t t t t i t i : i i : ' * i i t t i t t i i t t t ! i t i i i i > i i i t ! i i ) t t t t i t > i i i i i i i i i t i i t i i t t i i > i i 

TAGGCAGGGATRTTCACCATTATCGTTTCAGACCCACCTCCCAATCCCGAGGGGACCCGACAGQCCCGAAGG 
X 1 0 20 .50 40 50 GO 70 


2220 2230 2240 2250 22S0 2270 2280 

AATAGAAGAAGAAGGTGGAGAGAGTaGACAGAGACAGATCCATTCGATTAGTGAACGGATCCTTAGCACTTAT 

t ) : t t t t t t t i : . t i i i i t i i : i t i t i i t i t t i i i i i i i t i t i i ■ i t i i i t i i i i i i i i t i i i i t t i i i i i i 
i t i t i i t i i : i t : i i t : i i i i i i t t t i i i l i i i i i i i i i t t t i i i i t i i i i t i i i i l i i i l i i i i i i i i i i i 

RAT AGRAGAAGAAGGTGGRGAGAGAGACAGAGACAGATCCATTGGATTRGTGRRCGGRTCCTTAGCACTTAT 


SO 

30 

100 

1 10 

120 

130 

140 

2230 

2300 

2310 

2320 

2330 

2340 

2350 


CTGGGACGATCTGCGCLiAGCCTTGTGCCTCTTCAGCTACCACCGCTTGAGAGACTTACTCTTGATTGTAACGA 

i t i t t i i i i i ; i i i ■ i i i i i t i i t i t t i i 1 i i i i i t i i i i t r i i i i i > i i i t t i i i i t i i i i i i i i t t i t i 
i i i t t i i i i i i : i ’ > < t ; i t i t i t i t t t i t t t t t l i i i i i i i t i i i i i i i > i i i i i i i t i t t i t i i i i i i i 

CTGeGACGATCTGCeGAGCC—TGTGCCTCTTCAGCTACCACCGCTTGAGAGACTTACTCTTGATTGTAACGA 


3.50 

160 

170 

180 

190 

200 

210 

23GC 

2370 

2380 

2390 

2400 

2410 

2420 


GGATTGTGGAACTTCTGGGACGCAGGGGGTGGGAAGCCCTCAAATATTGGTGGAATCTCCTACAGTATTGGA 

i < t i i i i i i i ! t t i i i I i i i i i i t : i i i t i i i i ! i i i i t i < t i i t i i i i i i i i i t i i i i r I i i t i i i i i i t 

i i i t l i t i < i • t t i : i ; t i i ? i i i i t t i l l i t ( l i i i i i i i i i t t t i i i i i i i i i i i i i i i l i i ■ i i i i i i i 


GGATTGTGCAACTTCTGGGACGCAGGGGGTGGGAAGCCCTCAAATATTGGTGGAATCTCCTACAGTATTGGA 


220 230 240 250 2G0 270 280 


2430 X 

GTCAGGAACTAAAG 

i i i t i i i : l i t i ; 
i t i t I i i i t I 1 I ! 

GTCAGGAGC.TAA.AG 
290 30u 




